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Diskette Files to Accompany 8087 Applications and Programming 


About The Diskette 

The diskette, which is available as an option with the book, contains all 
the programs listed in The Cookbook. Each program appears in three 
forms: as an assembly language source program (e.g., PROG.ASM); as 
an already assembled object module (e.g., PROG.OBJ); and as a file ready 
to be BLOADed into BASIC (e.g., PROG.SAV). In several cases, a number 
of programs have been combined into a single module for ease. For 
instance, the most important matrix manipulation routines are combined 
in files MATRIX.ASM, MATRIX.OBJ, and MATRIX.SAV. The programs 
are supplied on a standard single-sided, 5.25 inch diskette formatted on 
an IBM Personal Computer running PC-DOS 1.1. A copy of the diskette 
documentation appears at the end of the book. 


vii 



Introduction 


Let me explain why I'm excited about the 8087. I've used large computers 
for many years as a tool for professional research. When 1 bought my 
personal computer, I found 1 could do many things far more conveniently 
than on these larger machines. But 1 quickly discovered that my machine 
wasn't fast enough for large scale numerical computing. 

Having an 8087 means that now I can solve many large problems on 
my personal computer. While some problems still belong on big machines 
(and always will), my personal computing horizon has expanded ten¬ 
fold. 

The 8087 isn't just fast; it's very easy to use. Whether you are mostly 
a "program user" or mostly a "program writer," you will find that the 
8087 is a remarkable device. I hope you will find 8087 Ap-plications and 
Programming an enjoyable, as well as an educational, introduction. 

Who is This Book For? 

• People who want to know what the 8087 will do (especially Part I, 
Chapters 1-4). 

• People who want to learn how to program the 8087 (especially Part 
II, Chapters 5-8 and Chapter 12 in Part III). 

• People who want prepared programs for number crunching appli¬ 
cations on their personal computer (especially Part III, Chapters 9- 
15). 

Part I describes the capabilities of the 8087 at a fairly non-technical 
level. If you are considering buying an 8087 and want to know about 
8087-compatible hardware and software. Part I is for you. 

Parts II and III are for the more technically inclined reader. While we 
"begin at the beginning," some prior programming experience is helpful. 
You needn't be an expert by any means, but this book isn't an Intro¬ 
duction to Computers. 

Part II (Chapters 5-8) provides an in-depth description of the 8087's 
instructions. We also discuss some of the fundamentals of assembly Ian- 



guage programming for the 8088. We pay special attention to linking 
assembly language and BASIC programs, including a blow-by-blow in¬ 
teractive session in which we link an assembly language program with 
both interpreted and compiled BASIC. 

Part III concentrates on applications. We develop many useful 8087 
assembly language routines in Part III. You can use these programs as 
examples, to learn more about 8087 programming, or you can use the 
programs "cookbook” fashion. (Part III also includes, in Chapter 12, an 
explanation of some of the 8087's most advanced instructions. 

How to Read This Book 

I've taken care to write so that you can skip around from one section to 
the next as suits your mood. Please don't feel constrained to read from 
beginning to end. 

Most readers will probably find Part I informative and easy reading. 
If you want to write 8087 assembly language programs, concentrate on 
Part II. (If you are an experienced 8087 programmer, you can skip Part 
II and move on to the applications in Part III.) If you are interested in 
applications, but don't care about intimate programming details, read 
Part III. You can always flip back to Part II if you need to check something. 

Finally, you can use the programs here "cookbook” style. You don't 
need to know why a program was written in a certain way or how it 
operates internally, if you just want to get a fast answer. Go ahead and 
use the programs. If a program is useful enough that you want to modify 
it or write a similar one yourself, you can return later for the "how and 
why.” 


The Cookbook 

Several chapters begin with an introductory paragraph and then a sign 
that says 

The Cookbook 


Under this sign, you will find a list of the programs appearing in the 
chapter, together with a brief description of the purpose of the program 
and the input and output required. Use the cookbook when you want 
to find a program in a hurry. We spend quite a bit of time discussing 
why certain things are done in certain ways. If you want to run programs, 
but not build your own, you don't need to read the "how and why” 
material. Do scan the material which describes the information you need 
to pass to the programs. 


/X 






Strategic Number Crunching 

In addition to a great deal of detail about the 8087 and about numerical 
programming, this book presents a strategic approach to serious com¬ 
putational work. Our strategy grows from two programming maxims: 

• 10 percent of program code accounts for 90 percent of program execution 
time. 

• The cost of creating a working program is proportional to the square of the 
length of code, regardless of the power of the programming language being 
used. 

Serious programmers sometimes go to great effort, mistakenly, to write 
"efficient programs." A far better strategy is to identify the 10 percent 
of the code that has 90 percent of the computational burden. Re-write 
the 10 percent for maximum efficiency; write the other 90 percent for 
maximum clarity. 

The search for efficiency often leads to writing programs in assembly 
language. Because assembly code can be 10 times the length of equivalent 
BASIC code, assembly language programs can be 100 times harder to 
debug. It (almost) never makes sense to write an entire program in as¬ 
sembly language. It does make sense to code the critical routines in as¬ 
sembly language. In this way, we get almost the entire advantage of 
assembly language speed at a small fraction of the cost of assembly 
language programming. 

We can actually do even better by recognizing that many numerical 
programming problems use the same underlying subroutines. An 8087 
assembly language program to add up an array is somewhat more com¬ 
plicated than a FOR/NEXT loop in BASIC. But we only need to write and 
debug the 8087 program once. Having done so, using the subroutine 
over and over is probably easier than writing a FOR/NEXT loop every 
time we need to add up an array. Computer scientists call this planned 
reuse of subroutines "modular programming." For an example of the 
convenience and power of modular 8087 subroutines, take a look at the 
statistical package in Chapter 14. 

You can actually do even better. 8087 routines for many numerical 
computing needs appear in this book (and on the optional diskette). 
While we hope you decide to learn about all the capabilities of the 8087 
and to write your own special subroutines, you're more than welcome 
to begin by lifting the subroutines bodily from these pages and putting 
them to use in your 8087-equipped personal computer. 


X 



Hardware and Software Requirements 

The programs in this book run on computers based on the Intel 8087 
Numeric Data Processor and the 8088,8086 family of Intel microproces¬ 
sors. In addition to the 8088 and 8086, this family includes the 188, 186, 
and 286 microprocessors and the associated versions of the 8087. The 
programs were all developed and tested on an IBM Personal Computer. 
All timing assumes the processor is a 5 megahertz 8088. Timings are only 
approximate. (For example, the IBM PC runs about five percent slower 
than the stated timings. An 8086-based machine will be somewhat faster.) 
Timings given for BASIC programs refer to interpreted BASIC without 
an 8087, unless otherwise qualified. 

The 8087 assembly language programs can be called as subroutines 
from programs written in either interpreted Microsoft BASIC or compiled 
Microsoft BASIC, as available on the IBM Personal Computer. The pro¬ 
grams assume that data is stored in 8087-compatible format. (See Chapter 
3 for an extensive discussion of 8087-compatible software.) The programs 
will run with pre-8087 versions of BASIC, but you will need to add the 
Microsoft-to- Intel conversion programs in the appendix. 

In order to assemble the programs in the book, you will need an 
assembler that recognizes the full set of Intel mnemonic instructions. (Be 
warned that version 1.0 of the IBM Personal Computer MACRO Assem¬ 
bler does not recognize 8087 instruction mnemonics, though it will gen¬ 
erate 8087 instructions. You can still use this assembler if you are willing 
to re-code the 8087 mnemonics into the 8088 ESCape instruction. On the 
optional diskette, we have already re-coded the mnemonics, so you can 
use the IBM Macro Assembler.) Since BASIC is the dominant personal 
language, we've written all the programs to be called from BASIC instead 
of FORTRAN or some other language more common on large computers. 
If you want to combine the programs with a language that uses different 
internal conventions BASIC, you may have to re-write a few instructions. 
The programs all work under Microsoft BASIC using version 1.1 of the 
PC-DOS operating system on the IBM Personal Computer. If you are 
using another computer or different software, some minor details may 
be different. 
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Disclaimers and Limits on Liability 


Legal style: 

The author and publisher of this book and any accompanying software 
hereby disclaim any and all guarantees and warranties, expressed and 
implied, on the programs and information herein. No liability for dam¬ 
ages, either direct or consequential, shall be assumed by author or pub¬ 
lisher. This product is sold on an "as is" basis; no fitness for any purpose 
whatsoever nor warranty of merchantability is expressed or implied. 

People style: 

We've tried very hard to make sure that all the information given is 
correct and that all the programs work. Nonetheless, it is possible that 
somewhere in the hundreds of pages of manuscript and the thousands 
of lines of code, a bug lurks. The purpose of the book and software is 
to teach. When you use the programs here, make sure you fully test 
them. If most of your programming has been in BASIC, please take special 
note of the section in the book on error-handling. Assembly language 
programs are by their very nature less fool-proof than programs written 
in high-level languages. 

If, despite all my precautions, you think you've found a bug, please 
write me (c/o Robert J. Brady Company, Bowie, Maryland, 20715) so I 
can correct future editions. 


Trademarks 

The following trademarks are used in this book: 

. IBM, IBM Personal Computer, IBM PC, and PC-DOS are trademarks 
of International Business Machines Corporation 

.8086, 8087, 8088, 186, 188, 286, Numeric Data Processor, and iAPX 
are trademarks of Intel Corporation. The 8087 and 8088 instruction 
mnemonics are copyrighted by Intel. 

. Microsoft and MS-DOS are trademarks of Microsoft Corporation. 

. Apple II -t- is a trademark of Apple Computer. 

. DEC-2060 and VAX are trademarks of Digital Equipment Corporation. 

. IMSL is a trademark of IMSL Inc. 



Turning Minutes into 
Seconds 


The 8087-equipped personal computer has three nice features: it's easy, 
accurate, and fast. Everything you need to apply 8087 power to practical 
computational problems is in this book. This first chapter describes the 
8087 and its use at the broadest level. 

Throughout the book, we try to take a scientific and analytical approach 
to understanding the 8087. Wherever possible, we discuss general prin¬ 
ciples—the "why" of programming—along with the hundred-and-one 
technical details needed to make a computer work. To keep the discussion 
concrete, each general principle is illustrated with a practical application. 
The 8087 is powerful, yet easy to use. We hope this book will be occa¬ 
sionally mind-stretching—and fun as well. 

How Easy Is Easy? 

The 8087 has been designed to emphasize ease of use as much as raw 
computational power. Your first step as an 8087 user is especially easy. 
Just add an 8087 to a personal computer and run your programs as usual. 
(You will need the version of BASIC, or other software, intended for use 
with the 8087.) Without any further effort, you can expect to see im¬ 
provements in execution speed ranging from about 20 percent to as much 
as a factor of 10. 

If you want the maximum advantage from the 8087's hardware power, 
you will need software specifically designed for the 8087. There is an 
extended discussion in Chapter 3 of what to look for—and what to look 
out for—when purchasing software for the 8087. Here we give a quick 
overview. 

The most important piece of knowledge about 8087 compatible software 
is really a statement about hardware. The 8087 extends the capabilities of 
existing processors without interfering with the processors' usual oper- 

1 



2 8087 Applications and Programming 

ations. Therefore, any software designed "pre- 8087" should continue to 
operate normally when the 8087 is present. 

Such 100 percent "upward compatibility" is a great advantage, but it 
does have a flip side. When you add an 8087 to a system, programs using 
"pre-8087" software do not speed up at all. For example, you can add 
an 8087 to the original IBM Personal Computer in a minute or two. (I 
added one to my IBM PC in order to write the programs for this book.) 
All your interpreted BASIC or compiled BASIC programs will run cor¬ 
rectly, but no faster. So when we make statements about the speed 
advantage from adding an 8087, there is also an implicit statement made 
about using 8087 compatible software. 

(With a little reprogramming, you can use the 8087 with pre-8087 ver¬ 
sions of BASIC and other software. We discuss this problem in Chapter 
3, but if you'd like a little reassurance, all the programs in this book were 
written using pre-8087 software.) 

You should be aware of one potential trap in buying software for use 
with the 8087. It is possible, though unlikely, that you will get into trouble 
by mixing software designed to take advantage of the 8087 with pre-8087 
software. See Chapter 3 for more discussion. 

Assuming you have the versions of BASIC or other programming lan¬ 
guages intended for use with the 8087, you can run all your usual pro¬ 
grams. Programs that do a great deal of numerical computing will race 
when compared to pre-8087 performance. If you are a really heavy num¬ 
ber cruncher, you will eventually want to use a library of specially written 
high-speed 8087 subroutines. 

Part III of this book (Chapters 9-15) includes the most important sub¬ 
routines for numerical computing. All you need do is read the explanation 
of how to use each subroutine, enter them into the computer, and com¬ 
bine them with your BASIC programs. (On the diskette available with 
this book, the subroutines have been typed in and assembled for you.) 
When compared to pre-8087 BASIC, the use of these subroutines in¬ 
creases execution speed by a factor of 10 to 200. (In rare cases, improve¬ 
ment factors as high as 500 have been noted.) 

Part II of this book (Chapter 5-8) prepares you for the most advanced 
stage of 8087 use: writing your own subroutines. As you will see in the 
examples throughout this book, programming the 8087 in assembly lan¬ 
guage is relatively simple because of the 8087's elegant design. When 
you've seen the examples and instructions here, you'll have no trouble 
writing your own special purpose programs. 


How Accurate Is Accurate? 

Easily-written, fast-executing programs are no great trick—if you don't 
care about getting the right answer. The most important attribute of the 
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8087 is its remarkable accuracy. The 8087 has three accuracy-enhancing 
features; 

• Internal calculations yield 11 more bits of accuracy than BASIC dou¬ 
ble precision numbers. That's worth three extra decimal places. 

• Internal calculations have an extremely wide range. The 8087 can 
represent numbers as large as 10'*^^^ and as small as 10“'*^^^. As a 
result, calculations rarely overflow or underflow during intermediate 
steps. In fact, both the precision and range of numbers are greater 
than those found on most traditional mainframe computers. 

• The 8087 is designed to handle a wide range of error conditions and 
make an automatic, and graceful, recovery. As a result, simple “pa¬ 
per and pencil" algorithms are much more likely to work. And when 
something goes wrong, the 8087 follows well- behaved rules instead 
of producing the wrong answer. 


How Fast Is Fast? 

Just how fast is an 8087-equipped PC? A good comparison can be made 
to either a standard mainframe computer or to a microcomputer without 
an 8087. 

Perhaps the most remarkable statement to be made about the 8087 is 
that it actually makes sense to compare its speed to that of a mainframe 
computer costing hundreds of thousands of dollars. The 8087 is several 
times slower than a half million-dollar machine—but then it's more than 
several times cheaper. 

Exact comparisons are always risky, but a few numbers can give you 
a feeling for the speed of the 8087. Moderate speed mainframe computers 
require from about one to five microseconds to multiply two numbers. 
A supermini might require one microsecond. A $50,000 table-top mini 
might require about 3 microseconds. Efficient 8088 software uses about 
400 microseconds to multiply two numbers (about 900 microseconds for 
double precision). The 8087, which is an inexpensive add-on to a personal 
computer, uses 20 to 30 microseconds for the same task. 

For the very first time, a microcomputer is a cost-effective alternative 
to number crunching on large computers. The PC with an 8087 has Vi 
to V20 the speed of a large computer at Vio to Vioo of the large machine's 
cost. While large machines will always be more cost-effective than micros 
for some tasks, the 8087-equipped personal computer is the first micro 
to compete economically with its larger cousins. 

Most PC owners care more about how the 8087 will speed up their 
personal computing than about comparisons to large central computer 
facilities. The speed advantage of adding an 8087 to a PC depends on 
the application and on how you use the 8087. (Having read through this 
book, you'll know the methods for attaining the greatest possiWe ad- 
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vantage.) The central point to understand is that the 8087 is a Numeric 
Data Processor. The 8087 only speeds up programs involving numerical 
computation. If you only use the PC for word processing, the 8087 is 
about 99 percent irrelevant. But if you crunch the occasional number, 
adding an 8087 is like trading a sparkler for the Fourth of July fireworks 
display. 

The speed advantage of the 8087 depends very much on how you use 
it, but as an overall guide: 

The 8087 turns minutes into seconds. 

Specific Speed Comparisons 

Just how much you get out of an 8087 depends on the software you use 
as well as the 8087's hardware speed. Speed is discussed extensively in 
Chapter 4. We give a preliminary discussion here. 

What the 8087 will do for you depends on how much time your soft¬ 
ware spends on various "overhead" tasks versus how much time is spent 
in numerical calculations. The 8087 speeds up the numerical calculations 
but does little or nothing about the time spent on overhead. Table 1-1 
shows what kind of results you can expect when you combine the 8087 
with low-overhead, high-speed routines. 


Table 1-1. BASIC versus 8087 speed benchmarks (time in seconds). 


50 by 50 matrix 

5,000 

Program 

multiplication 

square roots 

BASIC 

1200 

52 

8087 routine 

8 

0.35 


The times in Table 1-1 compare (pre-8087) BASIC to special 8087 routines 
which you will find later in this book. The improvement is typical of 
what the combination of the 8087 and good software can do. Depending 
on the application, the 8087 hardware produces an improvement in speed 
by a factor of about 10 to 50—the rest is due to the low-overhead software. 
You won't see nearly as good an improvement if you use the 8087 with 
high-overhead software. (The BASIC interpreters built into a computer 
are, of necessity, high-overhead software.) Since the 8087 only speeds 
up numerical calculations, and such software spends relatively little time 
on numerical calculation, the sum of overhead time and numerical cal¬ 
culation time won't fall by nearly the amount shown in the table above. 
The improvement will be impressive, nonetheless. 
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What Equipment Do You Need 
to Use an 8087? 

You need an 8087, of course. You can get an 8087 either as part of the 
original equipment of your personal computer or by adding it to an 
existing machine. You can probably add an 8087 to any PC based on the 
Intel 8088 or 8086 family. The degree of difficulty of adding an 8087 
depends on whether the manufacturer provided a place for the 8087 when 
designing the computer. Even if no provision was made, it is probably 
possible to add an 8087. However, doing so requires quite a bit of tech¬ 
nical expertise. 

The good news is that a number of manufacturers did provide a place 
for the 8087. In particular, when IBM (and those companies making 
compatible personal computers) introduced its first PC, it left an empty 
socket on the main circuit board expressly for the 8087. To add an 8087, 
you need only plug an 8087 into this empty socket. Plugging it in is easy 
(I installed my 8087 without help from anyone); easier, in fact, than 
adding a printed circuit board to one of the "expansion slots" inside the 
computer. (If you really know nothing at all about the inside of your 
computer, get someone to help you. Your computer is, after aU, a fairly 
expensive piece of equipment.) 

Once you have the cover off your machine, plugging in the 8087 takes 
under a minute. However, you may want to make one other hardware 
modification at the same time. Your computer probably has pre-8087 
software, such as a BASIC interpreter, wired into its Read Only Memory 
(ROM). If new, 8087-compatible software is available from your manu¬ 
facturer, you will want to upgrade the ROM chips at the same time. 

What about folks who own a personal computer that is not based on 
the Intel 8088, 8086 family. Can they take advantage of the speed of the 
8087? The answer, unfortunately, is a qualified "no." The 8087 works 
only with the Intel family. However, because the Intel family is so pop¬ 
ular, several enterprising companies now sell circuit boards, carrying an 
Intel processor, that fit into Apple and some other computers. Some of 
these boards include an 8087 or make provision for one to be added. 
These boards won't speed up programs executed on your original pro¬ 
cessor, but they do allow you to make use of the programs in this book 
and other 8087-compatible software. 
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Processors and Co-processors 

The "brain" of any computer is its "CPU/' or central processing unit. 
For the IBM PC, and many other "second generation" personal com¬ 
puters, the "brain" is an Intel 8088. A complete, general purpose central 
processing unit built into a single chip, the 8088 has a complete instruction 
set for 8- and 16-bit integer arithmetic, programming logic, and input 
and output. Like most microprocessors, the 8088 lacks the advanced 
mathematical instructions found in large, mainframe computers. 

The Intel 8087 Numeric Data Processor extends the instruction set of 
the Intel 8088 by adding sophisticated new mathematical instructions. 
The 8087 high-speed hardware carries out mathematical operations which 
would require thousands of lines of code if implemented in software. 
The 8087 hardware can operate 10 to 200 times faster than equivalent 
software. 



ters. Why not include all the capabilities on one chip, rather than create 
an add-on device? There are several reasons: 


• The 8087 is an extraordinarily sophisticated computational device, 
including 75,000 transistors on a single chip. Even though the 8087 
is "limited" to numerical processing, it is much more complex (and 
more expensive) than the general-purpose 8088. Building two sep¬ 
arate chips holds down development costs and allows users and 
system manufacturers to tailor-fit systems for different uses. 

• The 8088 (and its 16-bit bus sibling, the 8086) were available to the 
general market for several years before the first delivery of the 8087. 
In designing 16-bit personal computers, several manufacturers left 
an open socket, labeled the "co-processor socket" on the IBM PC, 
so that machines could be upgraded easily when the 8087 became 
available. 


7 
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• Because the 8087 and 8088 are two devices, they execute instructions 
simultaneously. As a practical programming matter, this means that 
while the 8087 completes one numerical computation, the 8088 pre¬ 
pares the next. 

In the remainder of this chapter, we describe the capabilities of the 
8087 in a general way. Chapter 5 provides a much more detailed technical 
discussion. 


Overview of the 8087 

The 8087 serves as a co-processor with the 8088. The 8087 "watches" in¬ 
structions as they are received by the 8088. The 8087 processes its own 
instructions, while allowing 8088 instructions to pass by. The 8088 also 
watches all instructions, processing its own, while allowing 8087 instruc¬ 
tions to pass by. The 8088 does provide one important service for the 
8087. On seeing an *8087 instrucffijn, the 8088 calculates any necessary 
memory address and maJces the address available to the 8087. The 8088 
then proceeds immediat^^g%) •^the next instruction. In this way, the co- 
process^tii design allows ' the 8087 "and 8088 to execute instructions si¬ 
multaneously, thus considerably enhancing total system performance. 

The central feature of the 8087's architecture is eight 80-bit data reg¬ 
isters. These registers are organized as a classic "pushdown stack," an 
organizational technique that leads both to fast vector operations and to 
efficient code generation by high-level language compilers. (Chapter 5 
includes an extensive discussion on the operation of the pushdown stack.) 
The 80-bit register width allows the 8087 to perform extremely accurate 
calculations. While the 8087 instruction set recognizes seven different 
data types in memory, all data is automatically converted to an 80-bit 
internal representation when brought into the 8087. This frees the pro¬ 
grammer from most worries about converting between data types. 


Instruction Classes 

Each of the 8087's 68 instructions fall into one of six classes. (The clas¬ 
sification scheme is a convenient way of describing the capabilities of the 
8087. You needn't remember the classifications in order to use the 8087.) 
The six classes are: 

Data transfer (discussed in Chapter 6). These instructions move data 
back and forth between the 8087 and memory and shuffle data in¬ 
ternally among the 8087 registers. 

Arithmetic (discussed in Chapter 6). At the heart of the 8087 instruction 
set are the operations for addition, subtraction, multiplication, and 
division—plus some extras such as square root and absolute value. 
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Transcendental (discussed in Chapter 12). The 8087 hardware has built- 
in capabilities for computing logarithms and trigonometric functions. 
(These instructions are rarely found even on large-scale mainframes.) 

Constants (discussed in Chapters 6 and 12). Seven of the most fre¬ 
quently used constants, such as 0, 1, and pi, are built into the 8087. 

Comparison (discussed in Chapters 6 and 7). These instructions are 
used for making less than/equal to/greater than, and other similar 
tests. 

Processor control (discussed in Chapter 12). This class of instructions 
gives the programmer total control over the behavior of the 8087. 
Some of these instructions are also used in conjunction with the 
comparison instructions and 8088 branching instructions to control 
program flow. 


Data Types 


The seven regular 8087 data types are examined in depth in Chapter 5. 
However, for most ordinary 8087 programming considerations, only a 
few facts are really important. The only data types directly available in 
BASIC are integer, single precision, and double precision. Generally, 
only the latter two are used to hold numerical data.-lf your principal use 
■qi^he 8087 is scientific programming, you need remember only three ^ 
fa; ^SiidaBi^^i4aAa :.tV'B.est->^ . 


1. SingMprecismM numbers (calleds^orJIrrajlin 8087 terminology) H^vej 
six or seven dedThSl di^tSjBf accur^y and occupy four bytes of 
memory. 

2. 8087 terminology) have 
15 or 16 decimal digits of accuracy and occ^i^i^b^y^p^mem- 
■5bi!y-.v>r 

3. 'internally llle^'80S7 cal¬ 
culations. They^^^^pfbetter than 18 decimal digits of'McfElacy. 
When stored in p,emory, a temporary real occupies 10 bytes. 


If you are primarily a number cruncher, these three data types will 
probably account for 95 percent of your use. However, the 8087 recog¬ 
nizes four additional data types: 

1. ihfeger numbers (called word integer in 8087 terminology) occupy two . 
.. bytes of storage and are used principally to index arrays and other 

data structures. BASIC and the 8087 use the same representation 
for integer data. 

2. A shorM^^ger occupies fouf“byte§' While the largest (signed) word 
integer is 32,768, a short integer can be as large as two billion. 

3. A long integer occupies eight bytes. A long integer has two or three 
more digits of accuracy than a double precision real number and 
can hold values as large as 10^®. 
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4. Packed decimal representation is used for business and data process¬ 
ing operations. A packed decimal uses 10 bytes of memory and 
holds 18 decimal digits. Unlike the three preceding data types, the 
pac ked decimal form uses decimal rather than binary representa- 

fdecimal digits are fh''en-''‘'paeketP two to a byte. 

By way of contrast, the types of data recognized by the 8088 hardware 
are limited to one- and two-byte binary integers and short packed decimal 
values. All the numerical processing in pre-8087 BASIC and other high- 
level languages is performed by software created from operations on 
integers. The 8087 eliminates the need for such software. Not only are 
8087-based systems faster, but programs use up much less space and 
numerical results are more reliable. 

How Does My Computer Access 
the Power of the 8087? 

In the next chapter we discuss software for the 8087. In order to under¬ 
stand why some software is 8087-compatible—and why some isn't—it 
helps to review the basics of the 8088/8087 co-operative set-up. 

The instruction set for the 8088 was designed to be extended at a later 
date. One of the 8088's instructions is called the escape instruction. The 
8QS8 knows that the escape instruction really calls for an operation on 
the 8087, so it essentially ignores this instruction and allows the 8087 to 
process it. The instructions used by the 8087 are different varieties of the 
8088 escape. 

When both the 8088 and 8087 are installed, we can think of the com¬ 
bination as one large computer with expanded capability. Software which 
uses the escape instruction internally must have an 8087 present in order 
to operate correctly. Software built "pre-8087" simply does not use the 
escape instruction and therefore does not take advantage of the new 
capacity. 

If you are writing your own programs at the machine language level, 
you'll know whether or not you've used the escape instruction. Most of 
the time you use a computer, such intimate internal detail isn't under 
your control. In the next chapter, we discuss some of the varieties of 
8087-compatible—and incompatible—software. 


/ 


Buying and Building 

8087-Compatible 

Software 


What special considerations apply when buying or building software for 
use with the 8087? Your first question will always be, “What software 
works?" Your second question, “How well?" In this chapter, we break 
our analysis of software compatibility into three parts. In the first part, 
we discuss some important technical details about compatibility. In the 
second section, we analyze why some software produces very fast pro¬ 
grams—and why some does not. In the chapter's last section, we discuss 
the merits of various types of software in terms of programming con¬ 
venience and calculation speed. 


Compatibility—The Technical Details 

Suppose we could look at a program that had been translated into our 
computer's “machine language." The'Sptidgram uses either the machine 
language ms^Jlgtionsilhat dMve the'^0 the “escape instructions" men¬ 
tioned in the last chapter, or it doesn't. If it doesn't use these instructions, 
thBBiBrthiaiaftliii iiirm levant. The program will run with or without an 8087 
and will run at the same speed either way. If the program does use 8087 
instructions, then the 8087 must be present, of course. 

As it turns out, there is a second issue, equally important for com¬ 
patibility, which hinges on a detail of software design. All computers 
represent numbers internally as particular patterns of O's and Ts. Dif¬ 
ferent computers use different patterns for the same number. For the 
most part, we don't care which pattern the computer uses, since we don't 
see the individual O's and I's anyway. The important thing is that the 
computer's hardware knows how to interpret its own patterns. (As it 
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happens, the representation used on the 8087 has been proposed as an 
industry standard. For the curious, we show what the 8087's represen¬ 
tation looks like in Chapter 5.) 


Until the introduction of the 8087, personal computers based on the 
8088 family had hardware for integer arithmetic only. Since there was 
no hardware "with an opinion" on how non-integers should be repre¬ 
sented, each software designer was free to choose his or her own patterns. 
In practice, this meant that whoever built translators for programming 
languages (compilers, interpreters, and assemblers) made the decision 
for everyo^ne using a part icular language. Since-Microsoft has been the 
principal siipplft^^^^|^ ^^ iiit»Hl,*<ii ilgbages for 16-bit computers, the 
vast bulk of software useP^the patterns chosen by Microsoft. 


Unfortunately, the Microsoft pattern and the Intel 8087 pattern are different. 

The result of this conflict is that pre-8087 software and 8087- compatible 
software cannot trade data represented in their respective internal for¬ 
mats. With your 8087 in place, you can safely use either pre-8087 or 8087- 
compatible software, t^do combine programs produced with 

pre-8087 and 8087-compatible translators, y®iwill usually get garbage. 
Further, if you try to exchange data between such programs you will get 
garbage if the data was stored using the computer's internal format. If 
the data is not stored in the internal format, then the programs can 
probably exchange data. 

There is no general rule as to whether a conflict will occur between 
two pieces of software; you need to know the particulars of each program. 
In the third section of this chapter, we give some examples of where to 
look for trouble. 


What Makes a Program Fast or Slow? 

^^0®8feBsita8i^g6i®TOpeta program's speed: the way you solve the problem 
(what computer scientists call.it^ ".^ 0 iithm''); ytSsifirTiard ware's speed; 
and the behavior of the programming language translatot. The first is 
always the most important. There is no computer so fast that it cannot 
be slowed to a crawl by a sufficiently bad way to solve a problem. The 
applications chapters of Part III supply high-speed solution techniques 
to many problems in numerical programming. 

The question of hardware speed you solve, of course, when you get 
an 8087. If hardware were the only determinant, your program execution 
time would be cut by a factor of 10 to 50! 

But hardware isn't the only determinant. Depending on how your 
program is translated into instructions the computer can understand, 
using an 8087 may drop execution time by only a few percent or speed 
up execution by a factor of 200. For this reason, and because you can 
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exercise a fair amount of control over which translator you use, we con¬ 
centrate on this third factor. 


Translating the Source Program 

Suppose we instruct the computer to add variables A and B and to save 
the result in variable C. A typical command might look like this: 

C = A-i-B 

The process of going from command to answer is composed of three 
phases: 

• Translation time 

• Invocation time 

• Calculation time 

IHWftfl'IWFf'WlI?* is'*W^ time the computer takes to figure out what to 
do. For example, every time the BASIC interpreter sees "C = A-I-B," it 
has to translate this to mean "find the variable A in memory and then 
find the variable B, next add the two, and finally place the sum in variable 
C." The BASIC compiler makes the same translation as the interpreter, 
but only once, rather than every time a line is executed. Interpreted 
programs spend a lot of time in the translation phase while compiled 
programs spend none at all. 

frwJ JOrf i SfV'f iit t e^ the time it taises the computer to calculate the ad- 
^ iic^ ses of the variables and to call the appr@pi 3 p|aH|||||||yigH)HiKtirifi^ 
For example, the BASIC ROM includes a floating-point addition subrou¬ 
tine. The interpreter calls this subroutine to add A and B. Code produced 
by the BASIC compiler calls a similar routine in the run-time library. 

is the time the computer spends doing the actual ad- 
4^o:h. ditect advantage of the 8087-hardware comes from im¬ 

provement in this pha§.e. 

Since tki^88P9^=§ ff^S l aBa p oa^ thisdast phas^, programs in which most 
of the time is spent in calculation get a big boost. Programs which spend 
most of their time in translation or invocation get only a small boost. 
Reduction of translation and invocation time depends on the appropriate 
choice of a program translator. 

You might think that we would always choose the translator that gives 
the fastest results. However, there are some tradeoffs involved. For ex¬ 
ample, sjurripilers, produce faster programs than interpreteiB, btlt inter<- 
pegtKrs are'more cD¥Wew!BBt to use. And, as a practical matter, almost 
every personal computer comes with a built-in BASIC interpreter, but 
not everyone has a compiler. 

So how important is each phase? The answer depends on the problem. 
In Chapter 1, we presented some representative timings for a matrix 
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multiplication problem and for taking 5,000 square roots. I've made some 
estimates of the time spent in each phase for pre-8087 interpreted BASIC, 
for an 8087-compatible compiler, and for an 8087 assembly language 
program. Table 3-1 shows the time in microseconds for a single addition 
and multiplication (for the matrix program), and for taking the square 
root of one element of a vector. I do have to warn you that Table 3-1 is 
much less accurate than other timings given in this book. Nonetheless, 
it gives a rough guide as to the trade-offs involved. 


Table 3-1. Execution-time speed breakdowns (time in microseconds). 


Matrix Problem Square Root 

(translate invoke calculate) (translate invoke calculate) 


interpreter 

( 

8400 ) 1200 

( 3600 

) 6800 

compiler 

0 

135 56 

0 

66 70 

assembly 

language 

0 

10 56 

0 

0 70 


We will refer back to Table 3-1 several times in our discussions in the 
next section. While the table shows the speed advantage of assembly 
language, it does not reveal the extra work generally involved in writing 
assembly language programs rather than BASIC. As a rule of thumb, an 
assembly language program requires ten times the amount of code as 
one written in BASIC. 

The bulk of numerical computing uses what are called "linear opera¬ 
tions." A small family of programs, such as matrix multiplication, can 
be put together to solve all sorts of different linear problems. With a 
library of these routines, such as the library put together in this book, 
you can solve most problems without having to write any subroutines 
yourself. 

The square root example is somewhat different. "Non-linear" opera¬ 
tions are all different; there isn't a small family of routines that you can 
re-arrange as needed for your own problems. As a result, non-linear 
problems require more custom programming. The more programming 
required, the more we will want to favor programming convenience over 
calculation speed. 

Both the matrix multiplication routine and the square root routine 
appear, in assembly language, in later chapters. As assembly language 
programs go, neither is very difficult to write. (. . . and of course you 
needn't write these particular programs, since we've already done so.) 
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Computational Accuracy 

Accuracy deserves as much attention as does speed. The 8087 is extremely 
accurate, but most translators don't allow you to access the 8087's 80-bit 
registers. Assembly language allows full use of 8087 accuracy, as do a 
few compilers (notably, those developed at Intel) intended specifically 
for use with the 8087. These compilers, which provide for operations on 
80-bit data, are not, at present, in common use. 

For some problems, the extra accuracy of 80 bits is worth any amount 
of programming inconvenience, but for "every day" use most of us will 
settle for double precision accuracy, (l^e disappointing omission’ol'dou- 
ble precision renders unacceptable, for general number crunching use, 
several prominent compilers used on personal computers.) The assembly 
language routines in this book use 80-bit data in the "delicate" part of 
calculations and the usual single and double precision data types else¬ 
where. 

8087-compatible Software 

In this section we discuss a number of different approaches to buying 
and building 8087-compatible software. For each approach, we discuss 
the trade-offs between programming convenience and execution speed. 
The approaches discussed are: 

• Using packaged programs 

• 8087 hardware with pre-8087 software 

• Interpreted BASIC 

• Compiler with 8087 floating point library 

• Compiler for 8087 "native code" 

• Assembly language modules for BASIC 

• Pure assembly language code 

Using Packaged Programs 

How much advantage the 8087 gives you with a "canned" program 
depends on how well the program is written. A really well-written canned 
program will take better advantage of the 8087 than any program you 
write. Not because the programmer knew anything about the 8087 that 
you won't discover in this book, but because for a program that sells 
thousands of copies, a programmer can afford to spend time squeezing 
out every last microsecond. Unfortunately, there is no real satisfactory 
way of knowing how good a canned program is short of "field testing" 
it. Also, unfortunately, software manuals almost never say anything about 
execution speed. 
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You will find three kinds of packages being advertised (with respect 
to 8087 compatibility). 

First, t^^|jg|are.pjiagl8aiiBS*htl!@P^^ run only with the 8087, which 
make no attempt at compatibility with earlier software or non-8087 ma¬ 
chines. Many applied problems cannot be solved on a microcomputer 
(in a reasonable amount of time) without an 8087. For programs that 
solve such problems, compatibility is not an issue. In fact, the speed of 
the 8087 is so critical for some applications that enterprising software 
houses began to market 8087-only packages before the manufacturers of 
personal computers had begun to sell the 8087! 

Second, there are i©jc:wife the 

8087. Some software comes in a single version that will run either way. 
Other programs come in two versions: one explicitly for the 8087 and 
one that does not use the 8087. 

Third, there are programs that ignore the 8087.' Almost all of these 
programs will run with the 8087 and those that are written in BASIC will 
automatically take advantage of the 8087 if you have an 8087-compatible 
BASIC interpreter in your computer. 

A first warning about canned programs. Many high-efficiency pro¬ 
grams save information on disk in what are called "binary" files. Binary 
files store data using the computer's internal representation of numbers 
rather than the "ASCII" representation more commonly used for disk 
storage. (This scheme allows programs to avoid conversions between 
internal and external formats and thus makes data storage and retrieval 
much, much faster.) As discussed above, the 8087 uses a different internal 
representation for numbers than does most pre-8087 software. For this 
reason, pre-8087 and 8087-compatible binary files are incompatible. 

If you use a pre-8087. program that saves binary files on disk and then 
switch to 8087-compatible software, you will be unable to read the files 
back in. Further, since you usually do not have access to a description 
of the file format, it may be impossible for you to convert the files yourself. 
To protect yourself when using a canned program with binary files, use 
the program to convert the files into an ASCII representation while you 
are still using the pre-8087 software and then convert them back to binary 
later. 

A second warning about canned programs. Many high-efficiency pro¬ 
grams use small amounts of assembly language code to speed up im¬ 
portant calculations. You do not generally have any way of finding out 
whether a particular package uses any machine code. If the machine 
language routines think numbers are stored using Microsoft's original 
format and the BASIC part of the program operates using Intel format 
. . . well, you can imagine the results. 
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8087 Hardware with Pre-8087 Software 

It would be awfully nice if we could get the benefit of the 8087 without 
attention to software. For reasons we've discussed, this isn't possible. 
For example, if you add an 8087 to a machine with a pre-8087 BASIC 
interpreter, your BASIC programs will run, but they won't make any 
use of the 8087. This is not much of an option. 

Understand, however, that it's the translator not the program that needs 
to be 8087-compatible. If you have an 8087-compatible BASIC interpreter, 
or some other 8087-compatible translator, your old BASIC programs will 
run and will take advantage of the 8087. (This illustrates an important-^, 
reason for using BASIC or another standard "high-level language." If 
the hardware changes, as is the case when an 8087 is added, you need 
only obtain a new translator and usually do not need to re-write your 
applications programs.) 

It is possible to combine 8087-compatible software with pre-8087 soft¬ 
ware by explicitly converting data back and forth between the Intel and 
Microsoft formats. (Conversion programs appear in the appendix.) For 
example, you can use the 8087 programs in this book with the original 
BASIC interpreter supplied with the IBM Personal Computer, but you 
will have to do a little bit of extra BASIC programming. 


Interpreted BASIC 

Depending on when you bought your personal computer, it will either 
include an 8087-compatible BASIC interpreter or you may be able to buy 
such an interpreter to replace the computer's original BASIC ROM. For 
most applications, the BASIC interpreter provides the easiest program¬ 
ming and the slowest execution. 

The 8087 does not substantially affect the speed of the translation or 
invocation phase of the interpreter's operation, but the calculation phase 
flies with an 8087 in place. Refer back to Table 3-1. For a problem like 
matrix multiplication, most of the action is in translation and invocation, 
so you can't expect more than about a 10 to 15 percent improvement 
over pre-8087 BASIC. 

Calculation time was a far greater fraction of total execution time in 
the square root problem. The 8087 has much more impact here; we might 
expect an overall gain of about a factor of three. Some non-linear func¬ 
tions, such as the trigonometric operations, spend even more time in the 
calculation phase. In some cases, we might see improvement by a factor 
of eight. 

We're ready now to draw our first conclusions. 

If most of your number crunching involves linear operations, the 8087 with 
the updated BASIC interpreter ALONE has only limited value. 
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If much of your number crunching uses the non-linear functions, the 8087 
with the updated BASIC interpreter is worth several non-8087 PCs. 

Here's an important warning about using the 8087 version of BASIC. No 
matter what you may be told, the 8087 and non-8087 versions of BASIC 
are not fully compatible (though they are close). Because floating point 
numbers are represented differently, there is no way to make them fully 
compatible. Two fundamentally irresolvable problems exist. 

First, the two floating point representations differ slightly in their pre¬ 
cision and range. In particular, for double precision the Intel format trades 
about one decimal place less precision for a substantially increased range 
for the exponent. On rare occasions, programs that worked on the orig¬ 
inal BASIC interpreter will give incorrect answers when used on the 8087 
version because of round-off error. Somewhat more frequently, programs 
that run under the new version will have overflow errors if used on a 
personal computer with the old BASIC. Fortunately, such problems are 
rare, and quite unlikely to be a major concern for most users. 

Second, some programs use the BASIC functions MKS$, MKD$, CVS, 
and CVD to convert back and forth between floating point numbers and 
strings. Typically, this is done in order to store numbers on a disk file 
in their binary representation. The functions work in both versions of 
BASIC. But if you store numbers on the disk in one version and retrieve 
them in the other, you will get garbage data without getting any indi¬ 
cations of error. If you use binary-representation files for storage between 
program runs, be absolutely certain to convert the files as part of the 
process of changing from one version of BASIC to the other. 

Compiler with 8087 Floating Point Library 

A compiler differs from an interpreter in that it translates the source 
language program only once, rather than every time a line of code is 
executed. Compilers have some disadvantages: they take a relatively long 
time to translate a program; they usually generate code that takes up 
more space than does an interpreted program; they slow the business of 
debugging programs; and they can be expensive. But they have one 
undeniable advantage over an interpreter: they eliminate the translation 
phase from program execution, and thereby reduce execution time enor¬ 
mously. 

Many of the compilers used on personal computers handle floating 
point operations in the following way. Whenever a floating point oper¬ 
ation is needed, the compiler generates a CALL to the appropriate sub¬ 
routine. After the program is compiled, the LINK program is used to 
combine the compiler output with a library of subroutines that includes 
all the floating point operations. IBM's BASIC compiler works this way. 

Compilers that use floating point libraries can be converted to 8087 
operation by substituting a new library for the one originally supplied 
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with the compiler. The original IBM BASIC compiler can be converted 
in this manner. Using a compiler with an 8087 library not only eliminates 
the translation phase, but also brings the calculation phase up to 8087 
speed. However, the invocation phase remains unchanged. Referring 
back to Table 3-1, we see that such a compiler might be 50 times as fast 
as pre-8087 BASIC in the matrix multiplication example and about 75 
times as fast on square roots. 

(You should be warned that the effectiveness of this approach to mak¬ 
ing a compiler 8087-compatible varies. Some implementations do not do 
nearly as well as the speed improvements suggested in the previous 
paragraph.) 

Another conclusion now: 

On linear problems, the combination of the 8087 and a compiler is very, very 
good. (Even if it doesn't quite reach our goal of "turning minutes into 
seconds.") On non-linear problems the combination is truly excellent. 


Compiler for 8087 “Native Code” 

Compilers on mainframe computers, and on minicomputers with floating 
point hardware, directly generate floating point instructions instead of 
generating calls to a subroutine library. This technique eliminates most 
of the invocation time. Some mainframe "optimizing" compilers are so 
good that the code they generate is almost as fast as assembly code. 
Equally good compilers for personal computers are only beginning to 
appear and are not currently in widespread use. You may want to look 
for 8087 "native code" compilers as they come on the market, since such 
a compiler provides the very combination of execution speed and pro¬ 
gramming convenience. 


Assembly Language Modules for BASIC 

Assembly language is at the bottom of the list when it comes to pro¬ 
gramming convenience, but at the top of the list Vithen it comes to speed. 
Fortunately, assembly language routines are easily combined with either 
interpreted or compiled BASIC, as well as with programs written in other 
high-level languages. In fact, preparing assembly language modules for 
frequently used tasks can be more convenient than writing the same code 
over and over again in BASIC. (It is very inconvenient to write re-usable 
modules in BASIC.) 

In a typical program, almost all the work takes place in a very small 
fraction of the code. Optimally, we use assembly language modules to 
replace this fraction of the code, while leaving the remainder of the 
program intact. This strategy leaves the bulk of the writing in a convenient 
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programming language and the bulk of the computation in a high speed 
routine. 

Assembly language remains the undisputed speed champion. The as¬ 
sembly language matrix multiplication routine which appears in Chapter 
10 is about 150 times faster than pre-8087 BASIC. The square root routine 
also beats BASIC by about 150-to-l. 

Pure Assembly Language Code 

When does it pay to write an entire number crunching program in as¬ 
sembly language? In my opinion, never. For linear problems, writing the 
entire program in assembly language has no significant speed advantage 
over using a small number of strategically chosen assembly language 
modules. (This is the approach we follow in the second and third parts 
of the book.) For non-linear problems, where isolating re-usable modules 
is difficult, writing special assembly language programs does increase 
speed over using a compiler, but only at an unreasonable cost in terms 
of programming effort. 

Two final conclusions: 

• If most of your number crunching is on linear operations—and most of the 
world's is—your best overall bet is probably the BASIC interpreter and a 
small set of assembly language routines, either the routines appearing in 
Parts II (Chapters 5-8) and III (Chapter 9-15) or another subroutine package 
you purchase commercially. 

• If a good part of your number crunching is non-linear, your best bet is 
probably the combination of BASIC compiler and 8087. While assembly 
language routines are still substantially faster than BASIC, BASIC is far 
more convenient. 

On to Chapter 4 

Just how does the 8087 stack up against other computers? In the next 
chapter we insert a few of our strategic modules in BASIC programs and 
run some timing tests. 



Benchmarks 


With the advent of the 8087, moderate-to-large scale numerical computing 
can now be done on a microcomputer. The 8087 increases the compu¬ 
tational range of the microcomputer by one to two orders of magnitude. 

The 8087 brings the "minimum-efficient-scale" of computing down to 
the personal level. In the past, a mainframe computer that cost 100 times 
more than a personal computer would have been thousands or tens of 
thousands times faster. While the 8087 remains several times slower than 
powerful mainframes, an 8087-equipped PC also costs tens or hundreds 
of times less. So today, the 8087 has made the personal computer a cost 
effective number cruncher. 

Historically, large computers have always been more cost efficient, in 
terms of raw computational power, thari smaller computers. Very large 
mainframes are more cost efficient than minis; minis are more cost effi¬ 
cient than micros. Just as the advent of "super-mini" computers a few 
years ago closed most of the gap between minicomputers and main¬ 
frames, the 8087 closes most of the gap between personal and mini¬ 
computers. To help you draw your own conclusions, speed benchmarks 
for a range of machines appear below. 

Comparing Benchmarks 

Speed and accuracy ratings are presented below for a number of different 
combinations of hardware and software. Before you start drawing con¬ 
clusions, understand what benchmarks do and do not tell us. 

Benchmark programs are used to compare various combinations of 
software and hardware by executing the same program under controlled 
conditions. We've continued here with the timing of the two problems 
examined in Chapter 1. The first benchmark program multiplies two 50 
by 50 matrices in order to illustrate the 8087's power in linear operations. 
The second benchmark program, taking 5,000 square roots, illustrates 
the 8087's non-linear calculations. Please realize that benchmark com¬ 
parisons have some limitations. 
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First, these benchmark problems are not intended to be “fair." I picked 
two problems which show off the capabilities of the 8087. They show 
the kind of results the "number crunching" user can reasonably expect, 
which aren't necessarily the results a "typical" user might expect and 
are totally unrelated to what a "word processing" user will see. 

Second, our benchmark programs are "tuned" to be efficient on the 
8087. For example, we've run most of the comparison programs in BASIC 
because BASIC is the dominant language on personal computers. On a 
larger computer, Fortran or APL or some other computer language may 
be more efficient than BASIC. If we were starting on one of these ma¬ 
chines, we might well program in a language other than BASIC. 

Even if not totally "fair," these benchmarks do give a pretty good idea 
of what the 8087 will do. The first set of benchmarks below, compares 
timings on an IBM Personal Computer with and without the 8087. The 
second set of benchmarks compares the 8087 to several other micro, mini, 
and mainframe computers. 

IBM Personal Computer Benchmarks 

The IBM PC is the most popular of the "second generation," 16-bit per¬ 
sonal computers. Internally, the PC uses an Intel 8088 microprocessor 
running at a "clock speed" of 4.77 megahertz. It is worth knowing for 
purposes of comparison that some of the 8088-based personal computers 
on the market run at a 5 megahertz "clock," and are just a little bit faster. 
Also, computers based on the 8088's "big brother," the 8086, are quite 
a bit faster. 

For this benchmark, we've taken Table 1-1 from first Chapter 1 and 
added a third alternative, the IBM BASIC compiler. Table 4-1 shows 
execution speeds for both matrix multiplication and the square root prob¬ 
lem using IBM's pre-8087 BASIC interpreter, IBM's pre-8087 BASIC com¬ 
piler, and our own assembly language modules. 


Table 4-1. BASIC versus 8087 speed benchmarks (time in seconds). 


50 by 50 matrix 

5,000 

Program 

multiplication 

square roots 

BASIC interpreter 

1200 

52 

BASIC compiler 

140 

6 

8087 routine 

8 

0.35 


The first two rows show why people turn to compilers. The IBM BASIC 
compiler beats the BASIC interpreter by around eight to one. Our 8087 
routines beat the compiler times by a factor of 20! 
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“Outsider” Benchmarks 

How does an 8087-equipped personal computer compare with “other 
people's" equipment? The comparisons below repeat our benchmarks 
on several popular combinations of hardware and software. 

Please don't read these comparisons as "better" or "worse." The hard¬ 
ware used runs from an Apple 11+ to an IBM 3081. The Apple isn't as 
fast as the PC, but then it doesn't cost as much. An IBM 3081 is faster 
than the PC, but it won't fit on your desk.. 

The comparisons are run on four machines: 

• Apple II + —Many people's favorite first-generation personal com¬ 
puter. Both programs used the Apple's built-in Applesoft BASIC 
interpreter. 

• DEC 2060—A moderate size mainframe computer used by many 
universities to provide time-sharing services. (Manufactured by Dig¬ 
ital Equipment Corporation.) Both programs were executed using 
compiled BASIC. DEC-2060 BASIC includes a matrix multiplication 
function which we used for the first program. 

• VAX 780—A 32-bit "super-mini" computer, very popular for mod¬ 
erate size number crunching applications. (Manufactured by Digital 
Equipment Corporation.) These test programs were written in the 
popidar scientific language FORTRAN, and executed using the VAX's 
optimizing compiler. 

• IBM 3081—The IBM 3081 is a very large mainframe computer costing 
millions of dollars. Both programs were written using the "Stanford 
BASIC" interpreter. We again used a built-in matrix multiplication 
function for the first program. 

The benchmark results appear in Table 4-2. 


Table 4-2. Micro, mini, 
seconds). 

and mainframe speed benchmarks (time in 

Program! Computer 

50 by 50 matrix 
multiplication 

5,000 

square roots 

8087 routine 

8 

0.35 

Apple 11+ BASIC 

1796 

130 

DEC 2060 BASIC 

5.2 

0.40 

VAX 780 FORTRAN 

1.6 

0.20 

IBM 3081 BASIC 

0.11 

0.26 


As we cautioned above, you need to be careful about benchmarks. The 
8087 routines make optimal use of the 8087's potential. (The 8087 routines 
appear in later chapters, so you can examine their innards if you wish.) 
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The programs on the other machines use standard programming tech¬ 
niques, and so make moderate to excellent use of the hardware's poten¬ 
tial. 

Caveats notwithstanding. Table 4-2 tells us quite a bit about how to 
classify an 8087-equipped personal computer. When it comes to number 
crunching, the 8087 doesn't just make a fast micro—it creates the equiv¬ 
alent of a slow super-mini or a slow mainframe computer! 




Introduction to 8087 
Architecture 

This chapter provides a detailed, technical description of 8087 architec¬ 
ture. The 8087 instruction set is described in Chapters 6 and 12. (For 
hardware and electronic details, see Intel's iAPX 86,88 User's Manual, the 
definitive source on the 8087.) 

More detail is given in this chapter than the typical 8087 user need be 
concerned with. You may want to browse through this chapter and then 
proceed directly to the description of the simple instruction set in Chapter 
6 . 

Co-processor Organization 

The 8087 is designed as a co-processor for the 8088 CPU. Both the 8087 
and 8088 ^ook'- at each instruction fetched from memory. The 8087 acts 
on its own instructions and ignores those belonging to the 8088. When 
tbe 8088 sees a«»8087 instruction, which is an 8088 ESCape instruction, 
it calculates the address of any data referenced by the instruction and 
reads—but ignores—one byte of data from this address. Otherwise, the 
8088 treats the 8087 instruction as a null operation. The 8087 copies the 
address calculated by the 8088 and uses it to store or fetch data to and 
from memory. In this way, the co-processor design allows the 8087 and 
the 8088 to execute simultaneously, considerably enhancing total system 
performance. 

To ensure properly coordinated parallel operation, 8087/8088 programs 
must follow the following synchronization rules: 

• The 8088 must not change a memory location referenced by an 8087 
instruction until the 8087 is finished. The 8088 is free to change its 
own internal registers and flags. 

• A second 8087 instruction must not be fetched until the current 
operation is complete. (Under special circumstances it is possible to 
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safely violate this rule, but such circumstances do not generally occur 
in application programs.) 

Synchronization, obedience to both rules, is achieved through judi¬ 
cious use of the 8088 WAIT instruction. The WAIT instruction tells the 
8088 to suspend processing until the TEST line becomes active. (The 8088 
checks the TEST line status once every microsecond.) When the 8087 
begins an instruction, it sets the TEST line to inactive. It then resets the 
TEST line to active when the instruction is complete. 

The programmer has responsibility for seeing that the first rule is 
obeyed. To ensure synchronization, code an FWAIT instruction after an 
8087 instruction and before an 8088 instruction whenever the two in¬ 
structions access the same memory location. (Except that the FWAIT may 
be omitted if neither instruction changes the memory location.) FWAIT 
generates an 8088 WAIT instruction. (Use of the mnemonic "FWAIT," 
for "floating wait," is a software convention.) FWAIT holds the 8088 until 
the 8087 operation is complete, thus preventing violation of the first rule. 

Responsibility for implementing the second rule is left to the assembler. 
The assembler automatically places a WAIT instruction in front of every 
8087 instruction. Thus the 8088 will suspend processing and not fetch 
another 8087 instruction so long as a previous 8087 instruction is still 
being executed. 

Programs violating either of the two rules will have unpredictable re¬ 
sults. Possible outcomes include the computer coming to a dead halt (if 
you are lucky), and having random numbers presented as final results 
(if you are not so lucky). 

Internal 8087 Registers 

Five internal data areas are accessible by the 8087 programmer. These 
are the register stack, the status word, the control word, the tag word, and 
the exception pointers. 

8087 computation is organized around eight 80-bit data registers. These 
registers form a pushdown stack, called the register stack. The register at 
the top of the stack is referred to as ST or ST(0); the register immediately 
below the top is ST(1); and so forth through ST(7). Many 8087 instructions 
implicitly reference ST(0) or both ST(0) and ST(1). Many instructions also 
push data onto or pop data off of the stack. (The stack is actually orga¬ 
nized as a chain, so that ST(0) is "below" ST(7). It is the programmer's 
responsibility to prevent stack overflow.) Stack operations are described 
in detail in Chapter 6. 

The 16-bit status word shows the current state of 8087 operations. We 
make extensive use of the condition code bits in the status word, which 
indicate the result of 8087 comparison operations. The status word also 
shows whether any exceptions (computational errors) have occurred. 
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whether the 8087 is busy, whether the 8087 has requested to interrupt 
the 8088, and which of the eight stack registers is currently the top of 
the stack. These elements are primarily used for systems programming. 
Figure 5.1 shows the layout of the status word. 


0 

1 .ST, 1 

0 

0 

0 

0 

□ 

Pe| UE 

0 

0 

DE 



EXCEPTION FLAGS (1 =: EXCEPTION HAS OCCURRED) 
INVALID OPERATION 
DENORMALIZED OPERAND 
ZERODIVIDE 
OVERFLOW 
UNDERFLOW 
PRECISION 
(RESERVED) 

INTERRUPT REQUEST 
CONDITION CODEd) 

STACK TOP POINTER(2) 

BUSY 


(1) See descriptions of compare, test, examine and remainder instructions for 
condition code interpretation. 

(2) ST values: 

000 = register 0 is stack top 
001 = register 1 is stack top 


111 = register 7 is stack top 


Figure 5.1. (Used with permission of Intel Corporation.) 


The 16-bit control word allows a number of 8087 options, described 
below under "control options," to be set under program control. These 
include the exception and interrupt-enable masks, which are primarily 
of interest to systems programmers. Other options, defining rounding, 
infinity, and precision controls, are occasionally used to control the re¬ 
sults of numerical operations. Figure 5.2 shows the layout of the control 
word. 

The tag word has two bits for each stack register to indicate whether 
the contents of the register are valid, zero, special, or empty. The exception 
pointers show the current instruction and operand. Neither the tag word 
nor exception pointers are normally of any interest to application pro¬ 
grammers. 

Control Options 

By manipulating the control word, you can change the way the 8087 
handles rounding, infinity, and precision. 

The 8087 offers four methods of rounding off answers that cannot be 
represented exactly in the available number of bits. The options are round 
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L EXCEPTION MASKS (1 = EXCEPTION IS MASKED) 
INVALID OPERATION 

- DENORMALIZED OPERAND 

- ZERODIVIDE 

- OVERFLOW 

- UNDERFLOW 

- PRECISION 

(RESERVED) 

-INTERRUPT-ENABLE MASKd) 

-PRECISION CONTROL(2) 

-ROUNDING CONTROL^) 

-INFINITY CONTROL^) 

-(RESERVED) 


1^) Interrupt-Enable Mask: 

0 = Interrupts Enabled 
1 = Interrupts Disabled (Masked) 

(2) Precision Control: 

00 = 24 bits 

01 = (reserved) 

10 = 53 bits 

11 = 64 bits 

(2) Rounding Control: 

00 = Round to Nearest or Even 

01 = Round Down (toward -«») 

10 = Round Up (toward +»>) 

11 = Chop (Truncate Toward Zero) 

W Infinity Control: 

0 = Projective 
1 = Affine 

Figure 5.2. (Used with permission of Intel Corporation.) 

to nearest, round down (toward minus infinity), round up (toward infinity), 
and chop (truncate toward zero). Round to nearest is the default. 

The 8087, unlike most computers, has a well-defined representation 
of infinity. The 8087 produces the proper result when calculating math¬ 
ematical functions with infinite arguments, at least when a mathemati¬ 
cally well-defined result exists. For example, 5/infinity yields zero. Both 
positive and negative infinity may be represented. 

Two modes of “infinity control" are offered on the 8087: affine closure 
and projective closure. Under affine closure, positive and negative infinity 
are regarded as being at opposite “ends" of the number line. Under 
projective closure, positive and negative infinity are considered equal, 
as if the two “ends" of the number line bent around and came together. 
Relative comparisons between finite numbers and infinity are permitted 
under affine closure, but not under projective closure. Projective closure 
is the default. 

Precision on the 8087 can be set to 64, 53, or 24 bits of accuracy, 
corresponding to the temporary real, double precision, and single pre¬ 
cision data types. This option is offered so that the 8087 may comply 
with certain industry standards which offer only reduced accuracy, and 
so that 8087 computation can be made compatible with less accurate 
computers. Aside from the compatibility issue, the only value in using 
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less than the full 64 bits of accuracy is the educational value of learning 
that more accuracy is better. Default precision is 64 bits. 

Exception Masking 

Various important computational errors are trapped by the 8087. When 
such an error occurs, the 8087 raises an exception condition. Exceptions 
may be unmasked ("unmasked" means exposed to the 8088), in which 
case a program interrupt occurs to permit user supplied exception han¬ 
dling software to take control. Usually, however, exceptions are masked 
(hidden from the 8088). The 8087 holds onto any masked exception and 
executes an internal error correction procedure. For example, if your 
program attempts to divide a number by zero, the 8087 will set the answer 
to infinity under exception masking. 

Table 5-1 presents the six exceptions and the most common masked 
response. Note that execution is never halted by a masked response. As 
a default, all exceptions are masked. See Appendix 2 for a full description 
of the masked responses to each exception. 


Table 5-1. Common masked response to 8087 exceptions. 

Exception 

Most Common Masked Response 

Zerodivide 

Return properly signed infinity 

Overflow 

Return properly signed infinity 

Underflow 

Denormalize result 

Denormalized 

Memory operand—proceed as usual 

Register operand—convert to unnormal 

Precision 

Round result 


Note: The terms "denormal" and "unnormal" are defined under Special Data Types, 
below. 


Number Systems 

The 8087 "understands" floating point, integer, and packed decimal num¬ 
bers. For number crunching, floating point numbers are by far the most 
important. 

Floating Point Numbers 

In order to accommodate a wide range of values, computers store num¬ 
bers in a "floating point" or "real" representation. Essentially, floating 
point is the computer's version of scientific notation. For example, in 
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standard scientific notation the fraction "negative one-half" can be writ¬ 
ten out as 

-5.0 X 10-1 

Scientific notation splits the representation of a number into three 
sections. The "sign field" tells us the sign of the number, in the case 
above the leading "-" indicates a negative number. Next, the "significand 
field," 5.0 above, gives the number's significant digits. (The significand 
field is also called the "mantissa.") The third section is the "exponent" 
field. The "“i" above tells us to multiply the significand by ten to the 
minus one power, or, equivalently, to shift the decimal point one place 
to the left. 

The 8087 stores floating point numbers in a form of scientific notation. 
The exact bit patterns used are laid out for the computer's convenience 
so they are a little less than obvious to humans. Fortunately,'we almost 
never need concern ourselves with such minute detail. While exact bit 
patterns are covered below, there are really three facts to know about 
each data type: 

1. How many bytes of memory are used up to store a number? 

2. How many digits of accuracy are retained in a number? 

3. How wide is the range of numbers which can be represented? That 
is, how large an exponent can be used? 

The answers to 1 through 3 are shown in Table 5-2. 


Data Types 

The seven regular 8087 data types are shown in Table 5-2. A brief dis¬ 
cussion of the use of each type appears below. 


Table 5-2. 8087 data types. 




Data Type 

Bits 

Significant 

Digits 

Range 

Word Integer 

16 

4 

-32,768 to 32,767 

(BASIC Integer) 

Short Integer 

32 

9 

-2X10MO 2x10® 

Long Integer 

64 

18 

-9x1018^9x1018 

Packed Decimal 

80 

18 

18 decimal digits + sign 

Short Real 

32 

6 or 7 

10-87 to 1038 

(BASIC Single Precision) 

Long Real 

64 

15 or 16 

10 to 10*^^^ 

(BASIC Double Precision) 

Temporary Real 

80 

19 

10-4932 to 10«32 
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Short real. Short real corresponds to BASIC's single precision data type. 
Micros have less storage than mainframe computers. Since real-world 
data rarely has more than six or seven digits of accuracy, this data type 
is commonly used for economical storage of basic input data. 

Long real. Long real corresponds to BASIC's double precision data type. 
As a rule, most calculations should be done in double precision in order 
to minimize the effect of round-off error in intermediate steps. 

Temporary real. Whatever the data type in memoiy, the 8087 converts 
all numbers to the temporary real format for internal use. The significand 
of the temporary real format holds 64 bits, so that every other data type 
can be loaded into a temporary real without loss of precision. 

By designing the 8087 around the temporary real concept, Intel has 
simplified the application programmer's life in several important ways: 

• Since ail data types are converted to temporary real by the hardware, 
the programmer rarely need worry about explicit type conversions. 
It is just as easy for the programmer to multiply a double precision 
floating point number by a packed decimal number as it is to multiply 
two integers. (Of course, when storing a number back in memory, 
the programmer remains responsible for ensuring that the destina¬ 
tion data type is large enough to hold the result being stored.) 

• The range for temporary reals is (almost) infinite. The exponent 
range is 10 to the ± 4932. As a result, overflows and underflows are 
almost always caused by a bug in either the data or the program, 
and only rarely indicate a numerical computing error. 

• The temporary real has 19 significant digits. Even when a long series 
of intermediate calculations produces significant cumulative round¬ 
off error, the loss of 3 or 4 digits of accuracy still leaves an accurate 
double precision answer. With the 8087 onboard, an IBM Personal 
Computer is more accurate than the standard IBM mainframe! 

Word integer. Word integer corresponds to BASIC's integer data type. 
A word integer occupies two bytes of storage and is principally used to 
index arrays and other data structures. 

Short integer. A four-byte integer. Not usually used in numerical pro¬ 
gramming. 

Long integer. An eight-byte integer. Not usually used in numerical pro¬ 
gramming. 

Packed decimal. Packed decimal representation is used for business and 
data processing operations. A packed decimal uses 10 bytes of memory 
and contains 18 decimal digits. Unlike the three preceding data types, 
the packed decimal form uses a decimal rather than a binary represen¬ 
tation. Each of the decimals 0—9 is represented by four binary bits. These 
decimal digits are then "packed" two to a byte. 
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Business and data processing programs generally spend much more 
time converting data between external (ASCII) and internal (binary) rep¬ 
resentation than doing arithmetic. Conversion between ASQI and packed 
decimal representation is quite easy. (Also, some data processing lan¬ 
guages, such as COBOL, use packed decimal representation as a standard 
data type.) 


Data Type Hardware Representations 

The 8087 knows exactly where each and every little bit goes. This is 
fortunate, because the physical and logical orders in which numbers are 
placed in memory differ. It is fairly easy for this difference to confuse us 
human types. However, the physical layout is easier for the machinery 
to handle and isn't relevant to programmers, except, on occasion, when 
tr 5 dng to debug a machine language program. The description of the 
exact hardware representations is included here for the sake of com¬ 
pleteness. 

Logically, all the data types are laid out left to right. The left-most bit 
is the most significant. Thus, a 16-bit integer is represented by a string 
of 16 bits running from the high-order bit 15 on the left to the low-order 
bit 0 on the right. Each of the seven data types is laid out in this way, 
as illustrated by Figure 5.3. 

Physically, the right-most logical byte comes first. For example, sup¬ 
pose a 16-bit integer is stored in memory locations 100 and 101. The low- 
order bits, 7-0, are in byte 100, and the high-order bits, 15-8, are in byte 
101. The same "reversal" holds for all the data types. This format is used 
throughout the 8088/8086 family and is common to many microproces¬ 
sors. See Figure 5.4. 


Floating Point Representation 

8087 floating point representation makes a number of concessions to the 
computer's convenience. 

• Numbers are represented, unsurprisingly, by a string of binary bits 
rather than decimal numbers. 

• The position of the "binary point" is implicit. Since computer mem¬ 
ory contains only zeros and ones, there is no convenient way to 
explicitly write in a decimal point. In ordinary scientific usage we 
write 153.7 as 1.537E2. (Computers typically use "E" in this context 
to indicate multiplication by a power of ten.) If our type font had 
no period, we might agree to write 153.7 as 1537E2 and agree that 
a decimal point is implicit after the first digit. On the 8087, the binary 
point is assumed to appear immediately to the right of the most 
significant bit of the significand. 
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WORD INTEGER 


SHORT INTEGER 


LONG INTEGER 


PACKED DECIMAL 


SHORT REAL 


LONG REAL 


TEMPORARY REAL 


INCREASING SIGNIFICANCE 



NOTES; 

S = Sign bit (0 = positive. 1 = negative) 
dn s Decimal digit (two per byte) 

X = Bits have no significance; 8087 ignores when loading, zeros when storing. 

A = Position of implicit binary point 

I = Integer bit of significand; stored in temporary real, implicit in short and long real 
Exponent Bias (normalized values): 

Short Real: 127 (7FH) 

Long Real: 1023 (3FFH) 

Temporary Real: 16383 (3FFFH) 


Figure 5.3. 8087 data type bit patterns. 
(Used with permission of Intel Corporation.) 


• Floating point numbers are represented in a "normalized" format. 
The leading bit of a floating point number is always a one. The 
computer shifts the significand left or right, while decreasing or 
increasing the exponent, in order to maintain this format. (However, 
see Special Data Types, below, for some exceptions.) 

• Since single and double precision numbers are always normalized, 
the leading bit is always a one and therefore needn't be stored. It 
isn't. The leading bit is stored in the 80-bit temporary real format. 

• Exponents in scientific notation can be either positive or negative. 
Rather than store an explicit sign bit for exponents, the 8087 uses a 
"biased exponent." The exponent field holds the sum of the true 
exponent and a positive constant. For example, the exponent stored 
in a single precision real number is the true exponent plus 127. The 
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S: Sign bit 

MSB/LSB: Most/least significant bit 
MSD/LSD: Most/least significant decimal digit 
(X): Bits have no significance 


S: Sign bit 

MSE/LSE: Most/least significant exponent bit 
MSF/LSF: Most/ieast significant fraction bit 
I: Integer bit of significand 


Figure 5.4. 8087 data type byte patterns. 
(Used with permission of intei Corporation.) 


exponent bias, chosen to provide the widest possible range given 
the number of bits assigned to hold the exponent, is 127 for single 
precision, 1023 for double precision, and 16383 for temporary real. 

To illustrate floating point representation, the significand of 2.0 is 
"[1]00 . . ." (where the "[!]" indicates the leading 1 is assumed but 
not stored and "00 ..." indicates enough zeros to fill out the rest 
of the significand field). The exponent of 2.0, for single precision, 
is 127. Examples of significand and exponent fields for other numbers 
are: 1/2 is "[1]00 . . ." and 126; 3.0 is "[1]10 . . ." and 127; and 4.0 is 
"[1]00 . . ." and 127. 

• Zero is represented by all exponent and significand bits set to zero. 
(The sign bit may be either positive or negative, without significance 
for any arithmetic or comparison operation.) 
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Integer Representation 

The three integer types are represented in "two's complement" format. 
Positive numbers are simply binary integers. Negative numbers are rep¬ 
resented in the following way: If X is a positive integer, then —X is 
written as (NOT X)-l-l. The left-most bit of an integer is always a one 
for negative integers and 0 for zero or a positive integer. 


Packed Decimal Representation 


Packed decimal numbers are integers represented with a sign and exactly 
18 decimal digits. Bits 0-3 hold the least significant digit, that is, the 
"one's place." Bits 4-7 hold the "ten's place," and so forth. Bits 72-78 
are unused. (If an additional digit were stored here, it would not always 
be possible to convert a packed decimal number into an eight-byte in¬ 
teger.) The high-order bit, bit 79, holds the sign. If a decimal digit is not 
in the required range 0-9, the result of using the packed decimal number 
is undefined. 


As an exercise, try writing out a number in each of the seven formats. 
Figure 5.5 gives the hexadecimal representation of -127 for each format. 
(Note that 127 is 01111111 in binary or 7F in hexadecimal.) 
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Figure 5.5. 8087 hexadecimal representation of -127. 


Special Data Types 

On most computers, every bit pattern represents a valid numerical value. 
In contrast, the 8087 reserves a large class of bit patterns to represent 
special non-numerical values. For almost all applications programs, these 
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special data types can be safely ignored. Here is a brief description of 
these types: 

Denormal: Real numbers are usually stored in the normalized format 
described above. An underflow occurs when the result of an oper¬ 
ation would require a negative biased exponent. Rather than merely 
set the result to zero, the 8087 "stretches" the precision of the result 
by setting the exponent field equal to zero and shifting the significand 
right the appropriate number of places. Thus, denormal numbers 
can be recognized, when stored in memory, by the zero exponent 
field together with a non-zero significand field. A denormal is con^ 
verted to an unnormal when loaded into the 8087 from memory or 
used in an arithmetic operation. Denormals are perfectly acceptable 
as operands for arithmetic instructions. (With the critical exceptiori 
of transcendental operations which assume without checking that 
operands are normals.) 

Unnormal: When a denormal is used in an arithmetic operation, the 
result is an unnormal. Unnormals exist only in temporary real format 
and can be recognized by a zero in bit 63 (as opposed to one for a 
normal). Unnormals are also perfectly acceptable in arithmetic op¬ 
erations. (Except that transcendental operations and unnormals don't 
mix.) The result of an operation on an unnormal is a normal when 
possible and an unnormal otherwise. The existence of denormals 
and unnormals provide a major convenience to the applications pro¬ 
grammer. Frequently, numerical algorithms create very small inter¬ 
mediate results. Most computers either halt with an underflow signal 
or set the intermediate result equal to 0.0. In contrast, 8087 routines 
continue to execute while maintaining maximum possible accuracy. 

Zero: Zero hardly seems like a special data type. However, it is very 
useful to know how the processor treats operations involving zeros. 
Real zeros may be signed either positive or negative, but the sign is 
always ignored. The 8087 is extraordinarily well behaved when using 
zero in arithmetic operations. Where most processors would come 
to an unpleasant halt, the 8087 produces a sensible answer; for ex¬ 
ample, the result of 7/0 is infinity and the result of 0/0 is indefinite. 

Pseudo-zero: Under certain rare circumstances, temporary reals may 
end up containing a type known as a pseudo-zero. The pseudo- zero 
behaves mostly like a zero. For most purposes, this type may be 
safely ignored. 

Infinity: The real formats include the values plus and minus infinity. 
These are represented by a biased exponent of all ones and a sig¬ 
nificand with a leading one and trailing zeros. Infinity can be used 
as an argument for most 8087 arithmetic operations. Infinity in a 
register is tagged special (in the tag word). 
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Real indefinite: The 8087 produces the value indefinite as the masked 
response to an invalid operation. A real indefinite is indicated by a 
negative sign bit, all ones in the biased exponent, and a significand 
with two leading ones followed by trailing zeros. Indefinite in a 
register is tagged special. 

Integer and packed decimal indefinite: For each integer type, the larg¬ 
est negative number (for example, —2^^) also represents indefinite. 
Packed decimal indefinite is represented by 16 leading ones with the 
trailing bits undefined. Use of integer and packed decimal indefinite 
should be avoided, as the 8087 treats integer indefinite as the largest 
negative number and gives undefined results after loading a packed 
decimal indefinite from memory. 

NAN (Not-A-Number): Any value, except infinity, with a string of 
ones for the biased exponent is a member of the class NAN (Not- 
A-Number). NANs propagate through arithmetic operations. Thus 
you can design software that treats members of this class as being 
"special" in any way you'd like (except for real indefinite which is 
reserved for the use described above). For example, particular NANs 
might be used to indicate unassigned memory locations while de¬ 
bugging a program or missing data in a statistical or accounting 
problem. 

In the next chapter, we turn away from architectural detail and begin 
writing our first useful programs. 



Simple Instruction Set 


We write our first program in this chapter: a simple routine to calculate 
the sum of an array of numbers. Before preparing our program, we 
discuss the 8087's basic instructions. 

The 8087 has six instruction groups: data transfer, arithmetic, transcen¬ 
dental, constants, comparison, and processor control. We discuss data trans¬ 
fer, a few of the comparison instructions, and the basic arithmetic operations 
in this chapter. We defer discussion of the less frequently used instruc¬ 
tions until Chapter 12. This chapter is divided into five sections. In the 
first section, we take a close look at the 8087 register stack. The next 
three sections look at the 8087 data transfer instructions, the 8087 basic 
arithmetic instructions, and the basic comparison instructions. In the last 
section we build our first program. 


The Stack Mechanism 

The 8087 has eight 80-bit registers for holding data internally. On most 
computers, these registers would be numbered 0, 1, 2, 3, 4, 5, 6, and 7; 
and a typical instruction would be something like "add register 3 to 
register 4 and leave the sum in register 3." The 8087 uses a more elegant 
system for accessing registers—the stack. 

The stack method is invariably described by analogy to the plate holders 
found in cafeterias. A stack of plates is loaded on a spring with only the 
top plate visible. If you put a plate on the stack, all the other plates move 
down and only the new plate is accessible. Remove the top plate and all 
the ones below move up one place. On a computer, the action of adding 
an item to a stack is called a push (all the data is pushed down one 
position), and removing the top item is called a pop (all the data pops 
up one position). For the sake of efficiency, a computer doesn't actually 
move data up and down. Instead the computer changes a pointer which 
indicates which register is at the top of the stack. 

The register on top of the 8087 stack is called ST or ST(0). The 8087 
also allows you to reference registers below the stack top. The piece of 
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data immediately below the stack top is called ST(1); one further down 
is ST(2); and so on through ST(7). As we push and pop data from the 
stack, these names become attached to different registers. 

Figure 6.1 illustrates the stack in action. Initially, the stack is empty. 
Next, we push the number 3.14 onto the stack. Now, ST(0) has the value 
3.14 and ST(1) through ST(7) are undefined. Suppose we push 2.18 onto 
the stack. ST(0) holds 2.18 and ST(1) holds 3.14. If we pop the stack, 
then ST(0) will again point to the value 3.14. 


EMPTY 

ST(0| 

3.14 

ST(0) 

2.18 

ST(0) 

3.14 

EMPTY 


EMPTY 

ST(1) 

3.14 


EMPTY 

EMPTY 


EMPTY 


EMPTY 


EMPTY 

EMPTY 


EMPTY 


EMPTY 


EMPTY 

EMPTY 


EMPTY 


EMPTY 


EMPTY 

EMPTY 


EMPTY 


EMPTY 


EMPTY 

EMPTY 


EMPTY 


EMPTY 


EMPTY 

EMPTY 


EMPTY 


EMPTY 


EMPTY 


Figure 6.1. 8087 stack mechanism. 


Notice that the stack mustn't grow to be more than eight deep, since 
the 8087 has only eight internal registers. The 8087 leaves responsibility 
for watching the depth of the stack in your hands. If a program does 
nine pushes in a row, you'll get incorrect answers—but no error mes¬ 
sages. (Technically, the 8087 registers are organized as a chain rather 
than a stack. On the ninth push, ST(7) becomes ST(0) and the previous 
contents of ST(7) are lost.) 


Data Transfer Instructions 

Data transfer instructions move data from memory into the 8087 (load), 
from the 8087 into memory (store), and between 8087 registers (ex¬ 
change). Inside the 8087 all data is held in temporary real format. In 
memory, operands fall into one of the seven data types discussed in 
Chapters 2 and 5. As data moves into or out of the 8087, it is automatically 
converted between temporary real and other formats. Three rules sum¬ 
marize the way the 8087 distinguishes among the different types of data. 
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• Data stored internally in an 8087 register is always temporary real. 

• Different instructions reference real, integer, and packed decimal 
types. 

• Within a given type, the 8087 distinguishes between arguments of 
different precision according to the amount of memory space the 
argument occupies. For example, an instruction which operates on 
reals treats an operand referencing a four-byte memory location as 
a single precision number and an operand referencing an eight-byte 
memory location as a double precision number. 

The data transfer instructions are summarized in Table 6-1. 


Table 6-1. 

8087 data transfer instructions. 


Real Transfers 

FLD 

Load real 

FST 

Store real 

FSTP 

Store real and pop 

FXCH 

Exchange registers 


Integer Transfers 

FILD 

Integer load 

FIST 

Integer store 

FISTP 

Integer store and pop 


Packed Decimal Transfers 

FBLD 

Packed decimal (BCD) load 

FBSTP 

Packed decimal (BCD) store and pop 


(Used with permission of Intel Corporation.) 

(The typical execution time for each instruction appears to the right of 
the instruction name. Appendix 1 gives more precise timing information.) 


Real Transfer Instructions 

FLD source 13 microseconds 

FLD (load real) pushes the source data onto the stack by changing the top 
of stack pointer to point to the next available register and then copying 
the source data into this register. The source may be either a real-memory 
location or an 8087 register. FLD is the basic instruction for moving data 
into the 8087. 

FST destination 23 microseconds 

FST (store real) copies the contents of the top element of the stack into 
the indicated destination, either an 8087 register or a single or double 
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precision memory location. Data moved into memory is automatically 
converted to single or double precision format. FST does not affect the 
depth of the stack or its contents. FST cannot be used to store a temporary 
real in memory. 

FSTP destination 23 microseconds 

FSTP (store real and pop) stores the top element of the stack and then 
pops the top of the stack. Unlike FST, FSTP will move a temporary real 
into memory. 

The "pop" is accomplished in two steps. First, the register currently 
at the top of the stack is marked "empty" (in the tag word). Second, the 
top of stack pointer is changed to point to the register logically "below" 
the current top of stack. Thus the instruction FSTP ST(0) pops the stack 
with no effective transfer. 

FXCH 3 microseconds 

FXCH destination 

FXCH (exchange registers) exchanges the stack top with the designated 
destination register. If the destination is not specified, ST(1) is assumed. 
Thus, FXCH with no destination swaps the contents of the two registers 
at the top of the stack. 


Integer and Packed Decimal Data 
Transfer Instructions 

FILD source 12 microseconds 

FILD (load integer) pushes the integer memory operand onto the stack 
(converting it to a temporary real). 

FIST destination 21 microseconds 

FIST (store integer) rounds the value held in ST(0) and stores the resulting 
integer in the destination memory location. (A copy of the value is made 
before rounding so that the contents of ST(0) remain unchanged.) The 
destination may be either a word or short integer. You cannot store a 
long integer with FIST. 


FISTP destination 21 microseconds 

FISTP (store integer and pop) rounds the top of stack element and stores 
it in the destination memory location. The top of stack is then popped. 
Unlike FIST, FISTP will store into a long integer memory location. 
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FBLD source 69 microseconds 

FBLD (load packed decimal) pushes the memory source location onto the 
top of the stack, converting the operand from packed decimal to tem¬ 
porary real. No check is made to see that the data is a valid packed 
decimal number. The result of loading invalid data is undefined and 
should be carefully avoided. (The "B" in FBLD comes from an alternative 
name for packed decimal representation, "BCD or Binary Coded Deci¬ 
mal.") 

FBSTP destination 117 microseconds 

FBSTP (store packed decimal and pop) converts the contents of the top 
of stack register into packed decimal representation and transfers the 
converted number to the destination memory location. The top of stack 
is then popped. If the top of stack element is not already an integer, it is 
converted to one by adding 0.5 and truncating. This rounding operation 
sometimes differs from FIST, which operates under 8087 rounding con¬ 
trol. Rounding control can be effectively invoked by preceding FBSTP 
with FRNDINT, which rounds the stack top to an integer. (FRNDINT is 
described in Chapter 12.) 


Basic Arithmetic Instructions 

The 8087 has 21 basic arithmetic instructions, summarized in Table 6-2. 
Of these, 18 provide varieties of addition, subtraction, multiplication, 
and division. In addition to the standard use of these four basic opera¬ 
tions, the 8087 also allows "reversed" subtraction and "reversed" divi¬ 
sion. 

In normal subtraction, the destination is replaced by the destination 
minus the source. In reversed subtraction, the destination is replaced by 
the source minus the destination. Reversed division operates anala- 
gously. (No reversed operations are needed for addition and multipli¬ 
cation, since both operations are commutative.) 

Including the reversed operations, there are six basic arithmetic in¬ 
structions. Each instruction comes in three formats; real, real-and-pop, and 
integer. Thus, there are 18 total instructions. 

Arguments in the real format may take the stack form, the register form, 
or the real-memory form. In the stack form, the destination is always ST(1) 
and the source is always ST(0). In the register form, one argument is the 
stack top, ST(0), and the other is any 8087 register. In real-memory form, 
the destination is ST(0) and the source is a location in memory. Only 
single precision and double precision types may be used in the real-memory 
form. 

The real-and-pop format uses the register form. After the operation, the 
stack is popped. For example, FADDP ST(1),ST adds the top two stack 
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Table 6-2. 8087 arithmetic instructions. 


Addition 


FADD 

FADDP 

FIADD 

Add real 

Add real and pop 

Integer add 


Subtraction 

FSUB 

FSUBP 

FISUB 

FSUBR 

FSUBRP 

FISUBR 

Subtract real 

Subtract real and pop 

Integer subtract 

Subtract real reversed 

Subtract real reversed and pop 
Integer subtract reversed 


Multiplication 

FMUL 

FMULP 

FIMUL 

Multiply real 

Multiply real and pop 

Integer multiply 


Division 

FDIV 

FDIVP 

FIDIV 

FDIVR 

FDIVRP 

FIDIVR 

Divide real 

Divide real and pop 

Integer divide 

Divide real reversed 

Divide real reversed and pop 
Integer divide reversed 


Other Operations 

FSQRT 

FSCALE 

FPREM 

FRNDINT 

EXTRACT 

FABS 

FCHS 

Square root 

Scale 

Partial remainder 

Round to integer 

Extract exponent and significand 
Absolute value 

Change sign 


(Used with permission of Intel Corporation.) 


elements, stores the sum one element below the top of stack, and pops 
the stack. After execution, the original contents of the stack top have 
been discarded, the contents of ST(1) are replaced by the sum, and the 
register that was formerly ST(1) pops up to the top of the stack. 

The integer format references a word integer or short integer location in 
memory as the source. ST(0) is the destination. 
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Implicit Operands 

Stack operands may be either implicit or explicit. For example, in the stack 
form ST is always assumed to be the source and ST(1) is assumed to be 
the destination. For purposes of illustration, implicit arguments are shown 
below in curly brackets, as in {ST}, even though these operands are not 
coded in actual programs. 

Use of implicit arguments can lead the unwary programmer into great 
confusion. Unfortunately, an instruction with two implicit arguments 
has a different meaning from the same instruction followed by (the im¬ 
plied) explicit arguments. By convention, use of two implicit arguments 
tells the assembler that you wish to pop the register stack after executing 
the instruction. For example, the instruction "FADD" implies the source 
is ST and the destination is ST(1). But the assembler translates the in¬ 
struction as "FADDP ST(1),ST", which is quite different from "FADD 
ST(1),ST". You can avoid a lot of trouble by not taking "advantage" of 
this convention. Instead, make both arguments explicit. 

The various combinations of instruction formats are summarized in 
Table 6-3. 


Table 6-3. 8087 arithmetic instruction formats. 


Instruction Form 

Mnemonic 

Form 

Operand Forms 
destination, source 

ASM-86 Example 

Classical stack 

Fop 

{ST(1),ST} 

FADD 

Register 

Fop 

ST(i),ST or ST,ST(i) 

FSUB ST,ST(3) 

Register pop 

FopP 

ST(i),ST 

FMULP ST(2),ST 

Real memory 

Fop 

{ST, short-real/long-real} 

FDIV AZIMUTH 

Integer memory 

Flop 

{ST, word-integer/ 
short-integer} 

FIDIV N_PULSES 


NOTES: Braces {} surround implicit operands; these are not coded and are shown here 
for information only. 

op = ADD destination-^destination + source 

SUB destination^—destination - source 

SUBR destination^-source - destination 

MUL destination^—destination * source 

DIV destination^—destination source 

DIVR destination<-source ^ destination 

(Used with permission of Intel Corporation.) 


Addition Instructions 

18 microseconds 
25 microseconds 


FADD {ST(1),ST} 

FADD {ST,}real-memory 
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FADD ST(i),ST or ST,ST(i) 17 microseconds 

FADDP ST(i),ST 18 microseconds 

FIADD {ST,}integer-memory 27 microseconds 

FADD (add real), FADDP (add real and pop), and FIADD (add integer) 

add the source operand to the destination operand and leave the sum 
in the destination. In addition, FADDP pops the stack. 


Subtraction Instructions 


FSUB {ST(1),ST} 

18 microseconds 

FSUB {ST,}real-memory 

25 microseconds 

FSUB ST(i),ST or ST,ST(i) 

17 microseconds 

FSUBP ST(i),ST 

18 microseconds 

FISUB {STJinteger-memory 

27 microseconds 

FSUB (subtract real), FSUBP (subtract real and pop), and FISUB (subtract 
integer) subtract the source operand from the destination operand and 
leave the difference in the destination. In addition, FSUBP pops the stack. 

FSUBR {ST(1),ST} 

18 microseconds 

FSUBR {STJreal-memory 

25 microseconds 

FSUBR ST(i),ST or ST,ST(i) 

17 microseconds 

FSUBRP ST(i),ST 

18 microseconds 

FISUBR {ST,}integer-memory 

27 microseconds 


FSUBR (reversed subtract real), FSUBRP (reversed subtract real and pop), 
and FISUBR (reversed subtract integer) subtract the destination operand 
from the source operand and leave the difference in the destination. In 
addition, FSUBRP pops the stack. 


Multiplication Instructions 


FMUL 

{ST(1),ST} 

28 microseconds 

FMUL 

{ST.jreal-memory 

34 microseconds 

FMUL 

ST(i),ST or ST,ST(i) 

28 microseconds 

FMULP 

ST(i),ST 

28 microseconds 

FIMUL 

{ST,}integer-memory 

28 microseconds 


FMUL (multiply real), FMULP (multiply real and pop), and FIMUL (mul¬ 
tiply integer) multiply the source and destination operands and leave the 
product in the destination. In addition, FMULP pops the stack. 
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Division Instructions 


FDIV 

{ST(1),ST} 

41 microseconds 

FDIV 

{ST,}real-memory 

48 microseconds 

FDIV 

ST(i),ST or ST,ST(i) 

40 microseconds 

FDIVP 

ST(i),ST 

41 microseconds 

FIDIV 

{ST,}integer-memory 

49 microseconds 


FDIV (divide real), FDIVP (divide real and pop), and FIDIV (divide in¬ 
teger) divide the destination operand by the source operand and leave 
the quotient in the destination. In addition, FDIVP pops the stack. Note 
that FIDIV pelds a temporary-real quotient. 


FDIVR 

{ST(1),ST} 

41 microseconds 

FDIVR 

{ST,}real-memory 

48 microseconds 

FDIVR 

ST(i),ST or ST,ST(i) 

40 microseconds 

FDIVRP 

ST(i),ST 

41 microseconds 

FIDIVR 

{ST,}integer-memory 

49 microseconds 


FDIVR (divide real reversed), FDIVRP (divide real reversed and pop), 
and FIDIVR (divide reversed integer) divide the source operand by the 
destination operand and leave the quotient in the destination. In addi¬ 
tion, FDIVRP pops the stack. Note that FIDIVR yields a temporary-real 
quotient. 


Miscellaneous Arithmetic Instructions 

FSQRT {ST} 37 microseconds 

FSQRT (square root) replaces the top of stack with its square root. 

FABS {ST} 3 microseconds 

FABS (absolute value) sets the sign of the top of stack element to positive. 

FCHS {ST} 3 microseconds 

FCHS (change sign) changes the sign of the top of stack element. 

One more instruction really belongs in this chapter, even though it is 
a constant instruction and constant instructions are covered in Chapter 
12. However, this instruction is simple and extremely useful. It is; 

FLDZ {ST} 3 microseconds 

FLDZ (load zero) pushes a zero onto the top of the stack. 
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Comparison Instructions 

The 8087 includes a number of instructions for making size comparisons 
between numbers on the register stack. These instructions are used for 
tasks such as identifying the largest number in an array or for determining 
whether a number is less than, equal to, or greater than zero. We describe 
the instructions here. Illustrative programming examples appear in Chap¬ 
ter 7. 

The 8087 instruction set includes six comparison and one examine 
instruction. AllAe instructions operate on the stack top. Most compare 
the stack top to a specified source operand. There are four possible out¬ 
comes of a comparison operation: ST > source, ST < source, ST = source, 
or ST and source are non-comparable. The comparisons are reported by 
setting the condition code bits C3 and CO in the status word, as indicated 
in Table 6-4. The condition code can be examined by using the processor 
control instruction, FSTSW, discussed below. 


Table 6-4. Condition code setting following comparison. 


C3 CO Order 


0 

0 

ST > source 

0 

1 

ST < source 

1 

0 

ST = source 

1 

1 

non-comparable 


(Used with permission of Intel Corporation.) 


Note that non-comparable results from using NANs or projective infinity. 
Non-comparable usually indicates a previous overflow or illegal opera¬ 
tion. 


FCOM {ST,ST(1)} 9 microseconds 

FCOM {ST,}ST(i) 9 microseconds 

FCOM {ST,}real-memory 17 microseconds 

FCOM (compare real) compares the stack top to the source and sets the 
condition code bits. Temporary real format may not be used in the real- 
memory form. 


FCOMP {ST,ST(1)} 10 microseconds 

FCOMP {ST,}ST(i) 10 microseconds 

FCOMP {ST,}real-memory 17 microseconds 

FCOMP (compare real and pop) executes a FCOM and then pops the 
stack, discarding the contents of the stack top. 
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FCOMPP {ST,ST(1)} 10 microseconds 

FCOMPP (compare real and pop twice) executes a FCOM and then pops 
the stack twice. Thus, to compare two numbers in memory, push both 
onto the stack and then use a FCOMPP. 

FICOM {ST,}integer-memory 19 microseconds 

FICOM (compare integer) compares the stack top to a word integer or 
short integer in memory. 

FICOMP {STJinteger-memory 19 microseconds 

FICOMP (compare integer and pop) executes a FICOM and then pops 
the stack. 

FTST {ST} 9 microseconds 

FTST (test) compares the stack top to zero. 

FXAM {ST} 4 microseconds 

FXAM (examine) examines the top of stack and sets the condition code 
bits CO, Cl, C2, and C3 to indicate what sort of value is being held. (The 
various "sorts" were discussed in Chapter 5.) Table 6-5 shows the pos¬ 
sible combinations. 


Table 6-5. 

Condition code settings following FXAM. 

C3 

Condition Code 

C2 Cl 

CO 

Interpretation 

0 

0 

0 

0 

+ Unnormal 

0 

0 

0 

1 

+ NAN 

0 

0 

1 

0 

Unnormal 

0 

0 

1 

1 

- NAN 

0 

1 

0 

0 

+ Normal 

0 

1 

0 

1 

+ 00 

0 

1 

1 

0 

- Normal 

0 

1 

1 

1 

— 00 

1 

0 

0 

0 

+ 0 

1 

0 

0 

1 

Empty 

1 

0 

1 

0 

- 0 

1 

0 

1 

1 

Empty 

1 

1 

0 

0 

+ Denormal 

1 

1 

0 

1 

Empty 

1 

1 

1 

0 

- Denormal 

1 

1 

1 

1 

Empty 


(Used with permission of Intel Corporation.) 
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In order to make use of comparison instructions, we need to retrieve the 
condition code bits. The condition codes are retrieved with the processor 
control instruction, FSTSW. 

FSTSW word-integer 5 microseconds 

FSTSW (store status word) stores the 8087 status word at the two-byte 
destination location. All the comparison instructions set bits in the status 
word; FSTSW is used to move the status word into memory so that the 
appropriate bits can be examined and appropriate action taken. Gener¬ 
ally, FSTSW should be followed by an FWAIT, to ensure that the con¬ 
dition codes are actually stored in memory before the program proceeds. 

This completes our coverage of the basic 8087 instruction set. For most 
programs, these instructions are sufficient. More advanced 8087 instruc¬ 
tions are discussed in Chapter 12. 

Our First Program—Adding Up An 
Array of Numbers 

Our first program is picked to show off the speed and ease in using the 
8087. This program runs about 200 times faster than an equivalent BASIC 
program without the 8087! 

To write a complete 8087 program, we need a number of details that 
we haven't covered. For example, we really ought to specify how the 
routine gets its arguments from BASIC. In the interest of preserving 
everyone's sanity, we are going to cheat just this once by leaving out 
some details. Therefore, the program below won't run as it stands. (The 
program appears in full in Chapter 9.) 

We assume that, elsewhere in the program, someone has already de¬ 
fined a single precision array named ARRAY. Our task is to add up the 
numbers stored in ARRAY and place a single precision result in a variable 
named DSUM. The integer variable N has the number of elements in 
ARRAY. (ARRAY goes from ARRAY(O) to ARRAY(N-l)). A fragment of 
a BASIC program to do the job follows: 

ID DEFDBL D 
2D DEFINT I 
3D DSUn=D 

4D FOR I=D TO N-1 
SD ]>SUI1=DSUI1+ARRAY(I) 
bD NEXT I 

Notice that we collected the sum in a double precision variable to ensure 
getting at least single precision accuracy for the final answer. 

Our 8087 code appears below. The program assumes that ARRAY is 
an array of single precision memory locations, that N holds a non-neg¬ 
ative integer, and that DSUM is a double precision memory location. 
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Everything on a line after a semicolon is a comment. We have used 
comments to number the lines and mark each instruction as either an 
8088 or an 8087 instruction. 



nov 

CX-iN 




FLDZ 



-CflDfl7> 


JCXZ 

D0NE_ADDING 




nov 

BX-.0 



L00P_T0P: 

FADD 

ARRAY[BX] 

n5 

{aDa7> 


ADD 

BX-.M 


•caDaa> 


LOOP 

L00P_T0P 

-.7 

•caaaa> 

DONE.ADDING: 

FSTP 

Dsun 


•caQa7> 


The program uses the following strategy; place the number of array 
elements in the 8088 register CX. Subtract one from this register each 
time through the adding-up loop and quit when the register hits zero. 
Use 8088 register BX to keep track of where we are in ARRAY. (8088 
instructions are covered in detail in the next chapter.) A line-by-line 
explanation of the program follows: 

1. MOV cXiN. Load N into the CX register. (The 8088 instruction "LOOP", 
in line 7, subtracts 1 from the CX register. When CX hits zero, the 
program has gone all the way through the array, so we will jump 
out of the loop.) 

2. FLDZ. Push a zero onto the 8087 stack. We accumulate the running 
total in the top of stack element. 

3. JCXZ DONE-ADDING. If CX (that is, N) is zero, jump to 
DONE-ADDING before entering the loop. 

4. MOV BX-iD. Set the BX register equal to zero. BX is used as an index 
for ARRAY. When BX equals zero, we get the first element of array. 

5. LOOP-TOP: FADD ARRAY(BX]. Add the current element of 
ARRAY into the running sum we are accumulating in the top of 
the stack. 

6. ADD BX -1 M. Add 4 to the count in BX. Why? Single precision numbers 
occupy four bytes, so we have to move along ARRAY four bytes at 
a hop. (Some things are just naturally more clumsy in assembly 
language than in a higher level programming language.) 

7. LOOP LOOP-TOP. The LOOP instruction subtracts one from CX. If 
CX is still positive, the program "loops" to LOOP-TOP, otherwise 
we proceed to the next instruction, falling out of the bottom of the 
loop since we must have already added up all N numbers. 

8. DONE-ADDING: FSTP DSUN. Store the answer in DSUM. 

Besides the fact that it was a lot easier to write the BASIC program, 
what's the difference between BASIC and our 8087 code? One, the 8087 
program is a little more accurate, though on most problems we'd probably 
never notice the difference. Two, the 8087 is a bit faster. Adding 10,000 
numbers takes approximately 46 seconds in BASIC. The 8087 needs about 
one-fourth of one second. 



Introduction to 8088 
Assembly Language 
Programming 


Before the era of the 8087, all personal computer thinking was done with 
a general purpose microprocessor such as the Intel 8088. A number 
crunching personal computer combines the mathematical power of the 

8087 with the general programming capabilities of the 8088. The 8087 
needs the 8088 to talk to the outside world. In this chapter, we discuss 

8088 programming. 

This brings us to a dilemma. The 8087 is a simple, elegant machine. 
The 8088 is a complex, elegant machine. Chapters 6 and 12 of this book 
present a complete, detailed description of the 8087 instruction set. A 
similar descriptiori of the 8088 instruction set would require a book, and 
wouldn't be very interesting to readers who just want to crunch numbers. 
On the other hand, you can't get to the 8087 except through the 8088. 


As a compromise, we discuss just those 8088 features needed to get 
through to the 8087. We don't attempt to cover all features of the 8088 
or to talk about assembly language programming in general. This chapter 
is oriented toward the BASIC programmer; the experienced assembly 
language programmer is asked to forgive the occasional simplification. 
(If you are already comfortable with 8088 assembly language, you can 
skip this chapter entirely.) For full details on the 8088 (and 8086 family) 
we recommend: 

iAPX 88 Book by Intel; 

iAPX 86,88 User's Manual, by Intel; and 

The 8086 Primer, by Stephen P. Morse, Hayden Book Company. 

IBM PC Assembly Language, by Leo J. Scanlon, Rob^prt J. Brady Co. 
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Overview of the 8088 


Machine language instructions are much less powerful than BASIC com¬ 
mands. (A typical line of BASIC might be equivalent to 10 to 100 lines 
of machine language instructions.) Consequently, it's easy to understand 
what a single line of 8088 code does, but it can be very tedious to put 
together enough lines to do anything useful. For example, suppose we 
want to copy the data in integer variable A into integer variable B. In 
BASIC we write: 


B=A 

8088 code might be: 

MOV AXiA ;I10VE CONTENTS OF LOCATION A INTO REGISTER AX 

nOV BiAX illOVE CONTENTS OF REGISTER AX INTO LOCATION B 

A and B are integer variables in both sets of code, but there the sim¬ 
ilarity ends. We see the following differences: 

• BASIC uses mathematical notation. 8088 notation takes the form of 
a command to the CPU. 

• BASIC deals directly with the variables of interest. The 8088 uses 
internal registers as intermediaries. In this example, the data in A 
is transferred into a register named "AX" and then transferred from 
the AX register into B. 

• Anything following a semicolon is a comment in assembly language. 
BASIC uses the apostrophe and REM statement for this purpose. 

Suppose we wanted to deal with single precision numbers instead of 
integers. In BASIC, we declare the variables A and B to be of the appro¬ 
priate type. Thereafter, B = A works equally well for any type of variable. 
8088 code would have to be modified, leading us to some further differ- 


ences. 





nov 

AX-.A 

inOVE 

THE FIRST HALF OF A INTO AX 

nov 

BiAX 

inovE 

AX INTO THE 

FIRST HALF OF B 

nov 

AXiA+a 

inovE 

THE SECOND 

HALF OF A INTO AX 

nov 

B + S-.AX 

nnovE 

AX INTO THE 

SECOND HALF OF B 

• BASIC deals with data a number at a time. The 8088 works either 


on a word (two bytes) or a byte at a time. Since a single precision 
number occupies two words, two sets of MOV operations are re¬ 
quired to move a single precision number. 

• Unlike BASIC, which thinks in terms of variables, the 8088 funda¬ 
mentally thinks in terms of memory locations. In the instruction "MOV 
AX,A", "A" represents a memory location to be assigned by the 
assembler. "A+ 2" means the memory location 2 bytes after "A". 
"A + 2" does not mean add 2 to the value stored in A. 
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8088 Program Structure 

An 8088 assembly language program is structured into procedures, code 
segments, and data segments. 

Each separate program module is identified to the assembler as a pro¬ 
cedure by the PROC and ENDP directives (discussed below). The assem¬ 
bler remembers the location of each block of code identified so that the 
module can be called as a subroutine from another 8088 assembly lan¬ 
guage program or by use of the BASIC CALL statement. Normally, each 
procedure is a self-contained unit intended to perform one task in a larger 
program. 

Programs written for the 8088 segregate code and data into different 
areas of memory called segments. While any number of segments may 
reside in memory simultaneously, only one code segment and one data 
segment (plus a stack and an extra segment described below) may be 
active at any one time. Segments are identified to the assembler with the 
SEGMENT and ENDS directives (discussed below). Segments are limited 
in length to 64K bytes. 

One way to think of an assembly language program is that we write 
out an exact picture of how memory looks before execution begins. Some 
areas of memory hold program constants or are set aside to hold results 
produced by the computer. These areas are placed in data segments. 
Other areas of memory hold the executable code, as translated from 
assembly language into machine language, by the assembler. The code 
is logically organized into procedures. One or more procedures is then 
placed in each code segment. When we run the program, the computer 
places each segment, as a block, in memory and then begins execution. 

To master the 8088, one must understand: 

1. General registers 

2. Memory addressing 

3. Labels and data definition 

4. Some basic 8088 instructions 

5. Comparisons 

6. Branching 

7. Segments 

8. Memory stack 

9. Subroutine branching and returns 

10. Assembler directives 

General Registers 

The 8088 has eight general registers, each of which holds one 16-bit word. 
The registers are named AX, BX, CX, DX, SI, DI, BP, and SP. In the 
MOV examples above, any of these registers could have been used in 
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place of AX. However, each register has various special purposes in 
addition to its general role. The special uses of interest to us are: 

AX and DX—Register AX is sometimes called the accumulator. A few 
8088 instructions will only work with AX. Instructions that produce 
a double length result, such as multiplication, place the result in AX 
and DX. 

BX, SI, and DI—base and index registers (see Memory Addressing 
below). 

CX—count register (see Branching below). 

BP and SP—stack pointers (see Memory Stack below). 

In addition, registers AX, BX, CX, and DX can each be treated as a 
pair of 8-bit registers. The high-order bytes are addressed as AH, BH, 
CH, and DH and the low-order bytes are addressed as AL, BL, CL, and 
DL. Most 8088 operations can operate on either a word at a time or a 
byte at a time. Moving a byte into AH, for example, changes the high- 
order half of AX without affecting the low-order half. 

The AH half of AX also has a special use in moving "flags” around 
(see Branching below). 


Memory Addressing 

In the "MOV AX, A" instruction above, "A" represents a particular mem¬ 
ory location called the displacement. The first byte of memory is numbered 
0, the second 1, and so forth. The assembler figures out the number of 
the memory location for A and sticks the number into the instruction. 
Note that the 8088 addresses bytes, not words, so the first word begins 
at 0, the second at 2, the third at 4. (It is perfectly acceptable to store a 
byte at 0, a word at 1 and 2, and so forth. Words don't have to fall on 
even-numbered locations. However, the 8086 side of the 8088/8086 family 
will run a tiny bit faster when words do fall on even locations.) 

If we want to use the byte after location A we code "A-t-1". Analo¬ 
gously, the word after location A is "A + 2", and the byte before location 
A is addressed as "A-l". 

Just as BASIC allows indexed arrays, the 8088 allows us to index mem¬ 
ory. In BASIC the first element of an array A is A(0), the second A(l), 
and so forth. To pick different elements at different points in the program 
we code A(I), and set the variable I appropriately. In 8088 code we index 
memory by indicating that the value held in one of the registers is to be 
added to the displacement in calculating the address. We tell the 8088 
which register to use by placing its name in square brackets, as in A[BX]. 
Further, we can "double index" memory by placing a second register in 
square brackets, as in A[BX][SI]. Thus, if A is location 75, the BX register 
holds 150, and the SI register holds 1000, A[BX][SI] is location 1225. 
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Unfortunately, we are somewhat restricted in which registers can be 
used as indexes. If we use one register, it can be BX, SI, or DI. If we use 
two registers, one must be BX and the other can be either SI or DI. 
(Actually, BP can be used rather than BX as an index, but this is generally 
not done for reasons that become clear when we discuss the memory 
stack.) 

Thus, a memory address consists of a displacement and zero, one, or 
two index registers. The displacement may be omitted, in which case a 
displacement of zero is assumed. This usage is quite common, because 
when we call a subroutine from BASIC, BASIC passes the subroutine 
the address of each argument. If we call a subroutine with an argument 
A(0), the subroutine might place the address of A(0) into the BX register, 
use the SI register to hold an index, and address the array by [BX][SI] 
with no displacement. 

Note some critical differences between indexing in BASIC and indexing 
in assembly language. In BASIC if the index is 17, we get the 18th (started 
at zero, remember) element of the array, regardless of whether the array 
is of type integer, single precision, or double precision. In machine lan¬ 
guage if the index is 17, we get the 18th byte, not the 18th element of 
the array. Depending on the type of data being used, consecutive ele¬ 
ments have indexes 0, 2, 4. . ., 0, 4, 8. . ., or 0, 8,16. . . .Also, in BASIC 
we can specify multi-dimensional arrays. 8088 indexing is all one-di¬ 
mensional. 

When the displacement is added to the value of the index registers the 
result is a 16-bit logical address. Therefore, the address must be between 
0 and (2^^) -1, or 64K. (It is no coincidence that BASIC is limited to a 
64K area.) 

Most operations specify a register and a memory location. Instructions 
can also specify two registers, as in 

MOV AXiBX -iMOVE THE CONTENTS OF THE BX REGISTER INTO AX 

Some instructions allow one argument to be an immediate operand. An 
immediate operand is a constant built right into the instruction—the 
value is used “immediately," in contrast to being fetched from memory. 
For example, to set the register AX to zero and the value of memory 
location A to minus one: 

MOV AXiQ 

MOV Ai-1 

8088 instructions such as MOV can operate either a byte at a time or 
a word at a time. In truth, MOV is really two separate instructions, “move 
word" and “move byte." The assembler looks at the specified operands 
to decide which instruction we mean. Most of the time the assembler 
can figure out whether we want a byte or a word by examining the 
specifications used to define the memory location (see Labels and Data 
Definition below). Sometimes there aren't any such specifications, such 
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as when we use an index register without a displacement, and sometimes 
we want to override the original specifications. To order the assembler 
to think in terms of a byte or a word, use the "PTR" (pointer) directive. 
We indicate that location A is a byte or a word by saying “BYTE PTR A" 
or “WORD PTR A." So to move an integer whose location is held in BX 
into a location held in SI we might code: 

MOV AX.IilORD PTR [BX] 

MOV WORD PTR [SI1-.AX 

Labels and Data Definition 

An assembly language program consists of a series of one-line com¬ 
mands. Commands are actually of two sorts: instructions and assembler 
directives. A command may be preceded by an optional identifying label. 
To label an instruction, begin the line with the desired label and a colon. 
The program can jump to a labeled instruction in much the same way 
as a program can GOTO a line number in BASIC. To label a line containing 
a directive, begin the line with the desired label, but omit the colon. 

THIS_IS_A_LABEL: ilOV AX-iA 

Assembler directives do not generate any machine language code. In¬ 
stead they give the assembler information or ask it to perform a task, 
such as setting aside a memory location to be used as data storage. For 
example, the assembler directive “DW" sets aside two bytes of storage. 
It can be followed by an initial value and the storage area can be labeled. 

A Dll) 37 

Setting aside and labeling memory is somewhat analogous to the BASIC 
statement DIM. “DW“ stands for “define word." To define a word with 
no initial value, tell the assembler “DW ?". We can also define a series 
of words with a directive like “DW 3,5,?, -2“. Or we could set aside 10 
uninitialized words with “DW 10 DUP(?)“. Since an address is actually 
represented by a 16-bit integer, we can also initialize a memory location 
to contain the address of some other instruction, as in 

P0INT_T0_A_LABEL Dili THIS_IS_A_LABEL 

The 8088 deals with bytes and words. To set aside one or more bytes, 
we use the “define byte" instruction, DB. The 8087 deals with many 
more data types. Table 7-1 shows all the storage allocation directives. 

The assembler knows how much memory is supposed to be associated 
with a particular storage allocation directive. This knowledge is used in 
two ways. First, if you set aside storage using Define Byte, as in “A DB 
5“ and then try to use a word instruction, as in “MOV AX,A“, the 
assembler will warn you of a type mismatch. If you intend to move the 
two bytes at A and A -I- 1, you can override this mechanism by using the 
instruction “MOV AX, WORD PTR A". 
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Table 7-1. Storage allocation directives. 


Directive 

Interpretation 

Bytes 

Pointer type 

Data types 

DB 

Define Byte 

1 

BYTE PTR 

byte 

DW 

Define Word 

2 

WORD PTR 

word integer 

DD 

Define 

Doubleword 

4 

DWORD PTR 

short integer, 
short real 

DQ 

Define 

Quadword 

8 

QWORD PTR 

Long integer, 
long real 

DT 

Define Tenbyte 

10 

TBYTE PTR 

Packed decimal, 
temporary real 


(Used with permission of Intel Corporation.) 


Second, the assembler uses the storage allocation directives to decide 
whether 8087 instructions should operate on single or double precision 
data. For example; 

FLD DWORD PTR A 

loads a single precision number located at bytes A, A +1, A + 2, and A + 3 
onto the 8087 stack. The instruction 

FLD (JlilORD PTR A 

loads a double precision number located at bytes A through A + 7. 

We can also label a memory location without setting aside storage by 
using the directives EQU and THIS WORD. "THIS WORD" takes on the 
value of the next memory location and "EQU" assigns a value to a name. 
For example: 

A DU 10 DUP [f] 

B E(3U THIS WORD 

DU 30 DUP (?) 

These instructions set aside 40 words of storage. If A ends up being 
located at byte 100 of memory, then B will reference location 120. 


Some Basic 8088 Instructions 

In this section, we cover a few of the most common 8088 instructions, 
concentrating on those instructions we need later for programs. 

ADD destination,source 

ADD (Add) adds the destination and the source and places the sum in 
the destination. 
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AND destination,source 

AND (Logical and) does a bit by bit "and” operation. Bit "i" in the 
destination is set to one if bit "i" is one in both source and destination, 
otherwise it is set to zero. 

DEC destination 

DEC (Decrement) subtracts one from the destination. 

INC destination 

INC (Increment) adds one to the destination. 

MOV destination,source 

MOV (Move) copies the value of the source into the destination. 

MUL source 

MUL (Multiply) multiplies the source by AL or AX. If the source is a 
byte, it is multiplied by AL, and the result is placed in AH and AL (that 
is, the 16-bit answer that occupies AX). If the source is a word, the 32- 
bit answer is placed (upper 16 bits) in DX and (lower 16 bits) in AX. Both 
operands are treated as unsigned binary numbers. The source cannot be 
an immediate operand. 

OR destination,source 

OR (Logical inclusive or) does a bit by bit "or" operation. Bit "i" in the 
destination is set to one if bit "i" is one in either the source or the 
destination, otherwise it is set to zero. 

SHL destination,source 

SHL (Shift logical left) shifts the bits in the destination to the left. Bits 
that move out on the left "fall off the end" and zeros are moved in on 
the right. The source can either be "1," in which case the destination is 
shifted left one bit, or it can be CL, the lower half of the CX register, in 
which case the destination is shifted left the number of places indicated 
by the value held in CL. 

Notice that shifting a number left one place is the same as multiplying 
the number by two. It turns out that we frequently have need to multiply 
by two or by a power of two. The SHL instruction takes only six micro¬ 
seconds, while the MUL instruction takes about 30. 

SHR destination,source 

SHR (Shift logical right) shifts the bits in the destination to the right. Bits 
that move out on the right "fall off the end" and zeros are moved in on 
the left. The source can either be "1," in which case the destination is 
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shifted right one bit, or it can be CL, in which case the destination is 
shifted right the number of places indicated by the value held in CL. 

SUB destination,source 

SUB (Subtract) subtracts the source from the destination and places the 
difference in the destination. 


Comparisons 

Controlling the flow of a program is easier in BASIC than in assembly 
language. In BASIC, we would jump to line 100 when A is greater than 
B with a statement combining a comparison and a conditional jump, such 
as 


IF A>B THEN GOTO 100 

In assembly language, the comparison and branching are two logically 
separate steps. First, we use a comparison (or other) operation to set 
"flags" inside the 8088. Then, we execute a branching instruction which 
examines the flags and jumps if it sees the right ones "flying." The 8088 
has six internal "flags." These flags can be thought of as occupying six 
out of the 16 bits of a "flag register." The flags, their position, and 
meaning are: 

CF—bit 0—carry flag 

PF—bit 2—parity flag 

AF—bit 4—auxiliary carry flag 

ZF—bit 6—zero flag 

SF—bit 7—sign flag 

OF—bit 11—overflow flag 

The flag names are suggestive of their general use. We care about the 
flags for two reasons. First, 8088 comparison instructions set some of the 
flags to zero or one. Second, 8087 comparison instructions indirectly set 
some of the flags. 

The 8088 compares two numbers by using the CMP instruction. 


CMP destination,source 

CMP (Compare) compares the destination to the source, setting the flags 
to indicate the result of the comparison. The flags are read by the jump 
instructions outlined in Table 7-2. 

The 8087 does its own comparisons, but relies on the 8088 for program 
branching. To set up an 8088 branch following an 8087 comparison, we 
need to set the 8088 flags. SAHF is used for this purpose. 
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SAHF 

SAHF (Store register AH into flags) sets SF, ZF, AF, PF, and CF from 
bits 7, 6 ,4, 2, and 0 of AH. 

Branching 

JMP address 

The 8088 jump instruction is analogous to GO TO in BASIC. The program 
jumps from its current position to the address specified by the jump. 

The 8088 also has 18 conditional jump instructions. These instructions 
cause a jump to the specified address only if the flags have a certain 
pattern, otherwise execution continues with the next instruction. For 
example, if we execute a "JG SOME-LABEL" following a CMP, the pro¬ 
gram goes to SOME-LABEL if the destination was greater than the source, 
and continues on to the next instruction otherwise. Table 7-2 describes 
the conditional jump instructions. 

One warning: an 8087 comparison sets different bits than an 8088 
comparison. See below. 


Table 7-2. 

8088 conditional jump instructions. 


Mnemonic 

Condition tested 

"Jump if..." 

JA/JNBE 

(CF orZF) = 0 

above/not below or equal 

JAE/JNB 

CF = 0 

above or equal/not below 

JB/JNAE 

CF = 1 

below/not above nor equal 

JBE/JNA 

(CF OR ZF) = 1 

below or equal/not above 

JC 

CF = 1 

carry 

JE/JZ 

ZF = 1 

equal/zero 

JG/JNLE 

((SFxorOF) ORZF) = 0 

greater/not less nor equal 

JGE/JNL 

(SF xoR OF) = 0 

greater or equal/not less 

JL/JNGE 

(SFxorOF) = 1 

less/not greater nor equal 

JLE/JNG 

((SFxorOF)orZF) = 1 

less or equal/not greater 

JNC 

CF = 0 

not carry 

JNE/JNZ 

ZF = 0 

not equal/not zero 

JNO 

OF = 0 

not overflow 

JNP/JPO 

PF = 0 

not parity/parity odd 

JNS 

SF = 0 

not sign 

JO 

OF = l 

overflow 

JP/JPE 

PF = 1 

parity/parity equal 

JS 

SF = 1 

sign 


NOTE: "above" and "below" refer to the relationship of two unsigned values: "greater" 
and ''less" refer to the relationship of two signed values. 

(Used with permission of Intel Corporation.) 
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A simple program illustrates 8088 branching technique. Suppose we 
want to add up an array of 100 integers in memory and put the answer 
in a location called SUM. 



nov 

o 

r 

X 

iCLEAR OUT AX TO HOLD THE 




RUNNING sun 


nov 

CX-.1DD 

•nPUT A COUNT INTO CX 


nov 

BXnD 

iUSE BX AS AN INDEX REGISTER 

NEXT_ADD: 

ADD 

AX-,ARRAY[BX] 



ADO 

CD 

X 

ru 

^POINT BX AT THE NEXT ELEMENT 


DEC 

CX 

iSUBTRACT ONE FROM THE COUNTER 


cnp 

CX-,0 

as THE COUNTER ZERO YETf 


JG 

NEXT_ADD 

ar NOT-, ADD ANOTHER ELEMENT 


nov 

SUn-iAX 


ARRAY 

Dlil 

100 DUP(f) 


sun 

Dbl 




Because looping is so important, the 8088 has specialized instructions for 
this sort of routine. 

JCXZ address 

JCXZ (Jump if CX equals zero) takes a conditional branch if the CX register 
equals zero. In the program above, "CMP CX,0" and "JG NEXT_ADD" 
test the CX register at the bottom of the loop. We could instead use JCXZ 
to test the CX register at the top of the loop, as we illustrate below. The 
choice between testing at the bottom versus the top of a loop is largely 
a matter of style. We use both styles in this book to provide you with a 
variety of examples. However, as a matter of good programming practice, 
you may want to choose one style or the other and stick with it. 



nov 

AX-,0 

;CLEAR OUT AX TO HOLD THE 
RUNNING sun 


nov 

CX-,100 

\PUT A COUNT INTO CX 


nov 

BX-.0 

nUSE BX AS AN INDEX REGISTER 

NEXT-ADD: 

JCXZ 

ADD 

DONE 

AX-,ARRAY[BX] 

^GO TO DONE IF CX E(3UALS 0 


ADD 

BX-iE 

iPOINT BX AT THE NEXT ELEHENT 


DEC 

CX 

^SUBTRACT ONE FROn THE COUNTER 


jnp 

NEXT_ADD 

^GO TO NEXT-ADD 

DONE: 

nov 

sun-,AX 


ARRAY 

Dll) 

100 DUP(f) 


sun 

DU 




LOOP address 

LOOP (Loop on CX) subtracts one from CX and then jumps to the address 
if CX is not equal to zero. Thus LOOP is like a BASIC FOR-NEXT loop 
with a FOR statement "FOR initial-value TO 1 STEP—1". We could 
further modify the original program by replacing "DEC CX", "CMP CX,0", 
and "JG NEXT_A.DD" with "LOOP NEXT_A.DD". 
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You should be warned that the conditional jumps in Table 7-2, JCXZ, 
and LOOP all have one limitation. They only work when the target is 
within plus or minus 127 bytes. Usually, the target is close enough that 
the limitation isn't binding. (The assembler will warn you if the target is 
too far away.) Unconditional jumps (JMP) don't have this limitation, so, 
if you do get stuck, the solution is to write in an extra, close-by, uncon¬ 
ditional JMP as the target of the conditional jump instruction. 

8087 Branching 

An 8087 comparison sets the internal 8087 condition codes. These con¬ 
dition codes must be transferred into the 8088 flags prior to executing a 
conditional jump instruction. Because the 8087 condition codes do not 
exactly parallel the 8088 flags, a little more programming is required 
following an 8087 comparison than following an 8088 comparison. 

Making an 8087-comparison based decision involves three steps. 

• Execute an 8087 instruction to set the 8087 condition codes. 

• Transfer the 8087 condition codes through memory and into the 8088 
flags, using FSTSW and SAHF. 

• Execute an 8088 branching instruction. 

The 8087 processor control instruction FSTSW, store status word, stores 
the 8087 condition codes, among other things, into a two-byte area of 
memory. (FSTSW must be followed in this usage by the processor control 
instruction FWAIT.) After the FSTSW, the second byte of the memory 
area holds the condition code bits in just the right position to be loaded 
into an 8088 register and then dropped into the 8088 flags. The 8088 does 
not have four separate branching instructions corresponding to the four 
combinations of C3 and CO, the two condition code bits set by the 8087 
comparison instructions. The 8088 instruction JB jumps if CO is on and 
JE jumps if C3 is on. Thus a fragment of code to consider all possible 
outcomes of the condition codes might look like this: 

USSUriE STATUS-UORD IS A 2-BYTE AREA OF SCRATCH MEMORY 

DEFINED ELSEWHERE 

i DO A COMPARISON TO SET CONDITION CODES 
FSTSW STATUS-WORD 

FWAIT 

;now get condition codes into flags 

MOV AHn BYTE PTR STATUS_U0RD+1 

SAHF 

;now branch and take any appropriate actions 

JB LESS_0R_N0N_C0MP 

JE EiSUAL 

^COME HERE FOR GREATER THAN 

iEflUAL: i COME HERE FOR EflUAL 

iLESS_0R_N0N_C0MP: 

JE NON-COMP 
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iCOriE HERE FOR LESS THAN 
iNON-COriP: 

nCOME HERE FOR NON-COMPARABLE 


Segments 

The ability of the 8088 to address over a million bytes of memory provides 
PC owners with far greater power than was available on old 8-bit ma¬ 
chines. The designers of the 8088 had to solve a difficult problem in order 
to access such a large address space. 8088 registers are 16 bits wide. 2^^ 
is 64K. Addressing a megabyte requires 20 bits. The solution is found in 
the 8088 segment registers. 

The 8088 has four internal 16-bit registers called segment registers. 
When calculating an address, the 8088 picks the value from one of the 
segment registers, shifts it left four places, and then adds the logical 
address made up from displacement and index registers. The resulting 
20-bit address is called the effective address. For the most part, we ignore 
the segment registers. However, we sometimes need to manipulate them 
when dealing with subroutines. For example, the BASIC statement 
DEF SEG = defines the beginning of a segment. 

An address is completely specified by giving both a segment location 
and an offset location. For example, location 100 in the data segment can 
be written DS;100. The assembler directives SEG and OFFSET separate 
a complete address back into its component parts. For example, if the 
complete address of A is 8000:100, then SEG A equals 8000 and OFFSET 
A equals 100. 

The four segment registers are CS—code segment, DS —data segment, 
SS —stack segment, and ES —extra segment. 

Since the 8088 uses an area in memory for the stack, we can choose 
its size. (You will remember that the 8087 stack was limited to eight 
items.) We need to know about the stack for two reasons. First, BASIC 
passes arguments to subroutines by placing each argument's address on 
the stack. Second, we can temporarily save small amounts of information 
on the stack without having to allocate extra storage. 

On the 8088, the stack segment register, SS, gives the location of the 
stack segment. The stack pointer register, SP, points to the top of the stack. 
The stack grows (upside) down in memory, progressing toward location 
zero as it grows. Since the stack is just an area in memory, we can access 
data on it with any of the usual 8088 instructions. For example, "MOV 
BX,SP" and "MOV WORD PTR SS:[BX],0" will move the offset of the 
stack top into register BX and then replace the element on top of the 
stack with a zero. 

Usually, however, we use the stack manipulation instructions PUSH 
and POP. 
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PUSH source 

PUSH (Push) subtracts two from the stack pointer, SP, and then transfers 
two bytes from the source into the word at SS:SP. 

POP destination 

POP (Pop) transfers two bytes from SS:SP to the destination and then 
adds two to SP. POP effectively undoes the previous PUSH. 

• As an aid to manipulating data on the stack, whenever you code BP 
as a base register, as in "[BP] +10", the 8088 assumes you want the 
stack segment rather than the data segment. 


Subroutine Branching and Returns 

BASIC has the GOSUB and RETURN. The 8088 has CALL and RET. We 
describe the 8088 calling and returning mechanism here. The next chapter 
treats the BASIC-to-8088/8087 routine-calling mechanism in depth. 

CALL far-procedure-name 
CALL near-procedure-name 

CALL (Call a subroutine) is actually two instructions: one for calling 
subroutines in another segment, CALL far; and one for calling subrou¬ 
tines within the current code segment, CALL near. BASIC always uses 
a far CALL. Near CALLs are used in writing relocatable subroutines. 

CALL far pushes CS and IP (the instruction pointer, which holds the 
address of the next instruction) onto the stack. The address of the code 
segment of the subroutine and the location of the subroutine within the 
segment are taken from the procedure-name argument. (The assembler 
fills these in automatically.) CS is set to the address of the new code 
segment and execution begins at the beginning of the new subroutine. 

These conventions should sound a bit familiar to anyone who has called 
a machine language routine from BASIC. The DEE SEG = statement tells 
BASIC what value to load into CS. The command CALL SUB() tells BASIC 
to do an 8088 CALL to location SUB in the new code segment. 

CALL near pushs IP onto the stack and jumps to the location given 
as the procedure name. 

RET immediate-operand 

RET (Return) effectively undoes a CALL. The assembler codes a RET to 
undo either a far CALL or a near CALL, depending on whether the 
current procedure is marked FAR or NEAR. (See Assembler Directives 
below.) In a FAR return, the top two words are popped off the stack. 
The first gives the address of the next instruction and the second a new 
value for CS. In a NEAR return, one word is popped off the stack and 
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then used as the address of the next instruction. In either case, ‘"im¬ 
mediate-operand" additional bytes are popped off the stack. (The stack 
operates on words, not bytes. However, addresses are always specified 
in bytes. So, to pop one extra word, code "RET 2".) The immediate- 
operand is optional. 

Note that PUSH/POP and CALL/RET are matched pairs, much like 
FOR/NEXT or WHILEAVEND in BASIC or parentheses in mathematics. 
If the pairing is mismatched, things go very wrong. 


Assembler Directives 

Assembler directives aren't actually 8088 instructions. Rather, they sup¬ 
ply the assembler program with necessary information. We've already 
met some of the most important assembler instructions above under 
"Labels and Data Definition." The other important directives follow: 

label SEGMENT ‘class’ 


label ENDS 

SEGMENT and ENDS (END Segment) define the enclosed series of code 
or data definitions to be a segment named "label". The segment may 
optionally be given a "class" in single quotes. Because some software 
looks for the class of a segment, it is a good idea to give a code segment 
the class 'CODE' and a data segment the class 'DATA'. 

ASSUME CS:segment-label1 ,DS:segment-label2, 
SS;segment-label3,ES:segment-label4 
ASSUME promises the assembler that the segment registers will contain 
the indicated segment addresses. (It's the programmer's responsibility 
to see to it that the promise is kept at execution time.) Since a section of 
code always has a code segment, "CS:. . ." must always be present, the 
three remaining ASSUME specifications appear as needed. 

label PROCFAR 


label ENDP 

PROC (PROCedure) and ENDP (END Procedure) mark the boundaries 
of a procedure just as SEGMENT and ENDS mark the boundaries of a 
segment. FAR signals the assembler that the procedure will be called 
with a CALL FAR instruction. When the assembler sees a RETurn in- 
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struction, it generates a RET FAR. (For a NEAR procedure, which you 
can't call from BASIC, code NEAR in place of FAR). 

PUBLIC symbol 
EXTRN name.'type 

PUBLIC and EXTRN (EXTeRNal) are used to supply information nec¬ 
essary for linking together separately assembled or compiled programs. 
Information about a symbol defined to be PUBLIC is made available to 
other programs. EXTRN tells the assembler to treat "name," which has 
been defined in another program, as being of type "type." The following 
example shows the most common use of PUBLIC and EXTRN. 



prog ram 1 



PUBLIC 

LABEL1 

LABEL1 

PROC 

FAR 

LABEL1 

ENDP 


program2 




EXTRN LABELI.FAR 
CALL LABEL1 


Any label declared PUBLIC can be accessed by any program declaring 
the same name to be EXTRN. A label which is to be used by separately 
assembled programs should be declared PUBLIC. The declaration should 
be made exactly once, in the program where the label is defined. Any 
number of other programs may declare the label EXTRN. In particular, 
the name of an assembly language procedure should be declared PUBLIC 
if the procedure is to be called as a BASIC subroutine. 

SEGMENT/ENDS and PROC/ENDP are also matched pairs. Since these 
directives carry labels, the assembler will probably catch the error if you 
omit a half of either pair. 

END 

END marks the end of the entire assembly language program. 

This chapter has been heavy on required detail. In Chapter 8, we put 
this detail to work writing real 8087 programs. 



(O) 

(O) 

BASIC and the 8087 


Assembly language subroutines, in combination with BASIC programs, 
join the convenience of a high-level language with the speed of the 8087. 
In this chapter, we discuss the software conventions that must be ob¬ 
served in writing the 8087 routines. (If you want to use the 8087 proce¬ 
dures in this book for languages other than Microsoft BASIC, you may 
have to observe different conventions.) 


Calling a Subroutine 

Calling a subroutine requires three tasks. First, we have to set up a list 
of arguments that can be retrieved by the subroutine. Second, we have 
to store away a return address in a place the subroutine can find. Third, 
we jump to the subroutine. The CALL instruction takes care of the latter 
two tasks. The first is accomplished by pushing the addresses of the 
arguments onto the 8088 stack. 

Calling a subroutine is most easily explained with an illustration. Sup¬ 
pose we wanted to imitate the following BASIC code: 

DEF SEG=&HiaQD 
SUB = 0 

CALL SUB{A(0)-.SUn-,N) 

We could use the following 8088 program: 


ASSUME 

CS:CSEG-,DS:I>ATA- 

-SEGMENT-.SS: STACK-SEGMENT 

U 

SEGMENT 

*C0DE* 




MOV 

AXiDATA-SEGMENT 

^MOVE ADDRESS OF 

DATA 




SEGMENT 



MOV 

DS-.AX 

•^THROUGH AX INTO 

DS 

nM 

MOV 

AXiSTACK-SEGMENT 

^MOVE ADDRESS OF 

STACK 




SEGMENT 



MOV 

SS-.AX 

^THROUGH AX INTO 

SS 


MOV 

SP-.0FFSET STACK- 

TOP iSET SP TO STACK 

TOP 

-.7 

MOV 

AX-iOFFSET A 

•iPUSH ADDRESS OF 

A 


PUSH 

AX 

iONTO STACK 
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MOV 

AXiOFFSET sun 

iPUSH 

ADDRESS 

OF sun 

UD 

PUSH 

AX 

;0NT0 

STACK 


Ml 

MOV 

AX-,0FFSET N 

^PUSH 

ADDRESS 

OF N 

ua 

PUSH 

AX 

^ONTO 

STACK 


•M3 

CALL 

FAR PTR IflOOH'.D 

iCALL 

SUBROUTINE 


NEXT-LOCATION: '-.RETURN HERE WHEN SUBROUTINE ENDS 


CSEG ENDS 





•MS 

DATA-SEGMENT SEGMENT 'DATA* 




^ IiL 

A DU 

!□□□ DUP (f) 




'.17 

SUM DU 

f 




^ Ifi 

N DU 

IDDD 




•M^ 

DATA-SEGMENT ENDS 




^ao 

STACK-SEGMENT SEGMENT 'STACK* 




'.ai 

STACK-AREA 

DU IDQ DUP (f) 




^aa 

STACK-TOP 

E(2U THIS UORD 




'.aa 

STACK-SEGIIENT ENDS 






END 




^as 


1. ASSUME CS . . . . ASSUME promises the assembler we will 
set up the segment registers appropriately. 

2. CSEG SEGMENT ' CODE ’. Tell the assembler we are beginning the 
code segment. 

3-4. MOV AX->DATA_SEGMENT and MOV DSiAX. Put the address of the 
data segment into the data segment register, by transferring it 
through the AX register. We require two steps because the 
MOV, instruction allows immediate operands, like an address, 
to be moved into memory or a general register, but not into a 
segment register. 

5-6. MOV AXiSTACK-SEGMENT and MOV sSi AX. Put the address of the 
stack segment into the stack segment register. 

Note that we do not have to load the code segment register. 
Someone else must have already done this for us since we can't 
execute code to load the code segment register, or to do any¬ 
thing else, until the code segment register is loaded. The pro¬ 
gram that calls our subroutine is responsible for loading CSEG 
into CS. (And how does that program get CS loaded? And the 
one that calls it? The operating system initially loads the CS 
register when it first calls BASIC (or whatever). The CS value 
for the operating system is wired into the hardware.) 

7. MOV SP-iOFFSET STACK-TOP. Set the stack pointer register to 
point to the memory location after the end of the stack area. 
We could have written "MOV SP, STACK-AREA+200" with 
identical results. But by doing it this way, the assembler will 
load the correct address for the stack top even if we decide to 
change the size of the stack in line 22. 

8-9. MOV AX 1 0FFSET A and PUSH AX. We now push the addresses 
of the arguments onto the stack, in the order of appearance in 
the CALL statement. Since PUSH does not allow an immediate 
operand, we have to go again though a general register. The 
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assembler directive "OFFSET" tells the assembler to load the 
address of A rather than the value of the number stored in A. 
("OFFSET" means use the address relative to the beginning of 
the segment.) The convention of passing the address of an 
argument, instead of its value or its name, is sometimes called 
a "call by address." 

10-13. MOV AXiOFFSET SUM and PUSH AX and nov AX-.0FFSET N and 
PUSH AX. The addresses of SUM and N are pushed in a similar 
manner. Notice that no distinction is made between a scalar 
variable and the first word of an array. 

14. CALL FAR PTR lfl00H:D. CALL a FAR procedure. The current 
contents of the CS register and the Instruction Pointer (the 
address NEXT_LOCATION) are pushed onto the stack. Then 
CS is set to 1800H. ("H" indicates hexadecimal to the assembler 
just as "&H" does to BASIC. Hex addresses start with a digit, 
not a letter; for example, OAH, not AH, so that the assembler 
can distinguish a number from a name.) The program then 
jumps to location 0 in a code segment beginning at 18000H. 
(Remember that segment registers always have four zero bits 
added at the right.) 

15. CSEG ENDS. Tell the assembler we are ending the code segment. 

16. DATA-SEGMENT SEGMENT ’DATA'. Tell the assembler we are be¬ 
ginning the data segment. The compiler is smart enough to 
know that "OFFSET A" is an address in the data segment and 
that OFFSET STACK-TOP is an address in the stack segment. 

17. A DU IDDD DUP (f). Set aside 1000 uninitialized words for A. 

18. SUM DU f. Set aside one uninitialized word for SUM. 

19. N DU 1000. Set aside one word for N, initialized to 1000. 

20. DATA-SEGMENT ENDS. End the data segment. 

21. STACK-SEGMENT SEGMENT ’ STACK '. Begin the Stack segment. 

22. STACK-AREA DU 100 DUP if). Set aside 100 words for the stack. 

23. STACK-TOP EflU THIS UORD. STACK-TOP is equivalent to the 
address appearing after the 100 words allocated for the 
STACK-AREA. 

24. STACK-SEGMENT ENDS. End the Stack segment. 

25. END. End the program. 

The receiving subroutine finds the DS and SS registers pointing to the 
data and stack segments defined above. The CS register points to 1800 
hex. Most of the important information appears on the stack, which is 
shown in Figure 8.1. Remember that the 8088 stack actually grows upside 
down in memory, so that as we push addresses onto the stack, SP moves 
toward zero. Since we have pushed five words onto the stack (three 
argument addresses, CS, and NEXT-LOCATION), SP equals 
(STACK-AREA -I- 200) -10. 
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STACK_AREA+198 


STACK AREA+190 


OFFSET A 


OFFSET SUM 


OFFSET N 


CSEG 


NEXT LOCATiON 


SP POINTS HERE 


Figure 8.1. Memory stack after subroutine caii. 


Acting Like a Caiied Subroutine 


Machine language subroutines called from BASIC must obey a number 
of rules. The important ones are: 

• At entry, CS is set according to the last DEF SEG. The other segment 
registers point to the beginning of BASIC's data area. 

• At exit, all segment registers and registers SP and BP should hold 
their original values. The other registers, and the flags, may be 
changed. 

• BASIC promises that the stack pointed to by SP will have eight free 
words. If the subroutine needs a larger stack, it must set up its own. 

• The subroutine must pop the argument addresses off the stack before 
returning. 

Let's write a subroutine, to add up an array of integers, in a form that 
could be called by the code sequence appearing in the preceding section. 


CSEG 

SUB 


ADD-LOOP: 


PUBLIC SUB 
ASSUME CS:CSEG 
SEGMENT ’CODE’ 
PROC FAR 
PUSH BP 
MOV BPnSP 
MOV BX-.[BP]+10 
MOV SI-i[BP)+b 
MOV cxqsi] 

MOV AXiO 

ADD AXnUORD PTR 
ADD BX-.E 
LOOP ADD-LOOP 


iSAVE BP 

iFIND ARGUMENT LIST 
^ADDRESS A 
lADDRESS N 
nCX GETS N 
nCLEAR AX 

[BX] lADD A[BX] 

nNEXT ELEMENT 
nDO IT AGAIN 


n3 

\ (a 

\7 

ilQ 

nil 

nl2 

nl3 

nlM 
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nov 

DIi[BP]+a 

^ADDRESS sun 

hlS 


MOV 

[DI1-.AX 

tStore sun 

lit 


POP 

BP 

nRESTORE BP 

il7 


RET 


nRETURN 

iia 

SUB 

ENDP 



sn 

CSEG 

ENDS 



•iBO 


END 



iEl 


1-3. The PUBLIC, ASSUME, and SEGMENT statements supply the usual 
information to the assembler. 

4. PROC FAR tells the assembler that this routine will be called with 
a FAR CALL; information needed to generate the proper type 
of return instruction in line 18. 

5. PUSH BP. Save the value of the BP register by pushing it onto 
the stack for later retrieval. Note this instruction subtracts two 
from SP, so SP now equals STACK-AREA-I-188. 

6. MOV BPiSP. Copy the stack pointer, SP, into BP. The instruc¬ 
tions that follow retrieve information from the stack. BP can 
serve as a base register, as in [BP], while SP cannot. 

7. MOV BXn[BP]+lD. Copy the contents of [BP]+ 10 into BX. Since 
BP equals STACIC^\REA+188, [BP] +10 is STACKLAREA +198. 
STACK-AREA +198 holds OFFSET A, so after this instruction 
BX holds the address of the first word of A. 

8. MOV si-i(BPj+b. By the same logic, move the address of N into 
SI. 

9. MOV CXi[Sl]. Now move the value of N into the count register, 
CX. 

10. MOV AXiO. Clear out the accumulator, AX. 

11. ADD-LOOP: . Label the top of the loop. Notice that this loop does 
not worry about errors such as negative or zero N nor about 
the accumulator overflowing. (Not very good programming 
practice!) 

12. ADD AX-liBX]. Adds the element of A currently pointed to it by 
BX into AX. The first time through, this is A(0); the second 
time, A(l); and so forth. 

13. ADD BX-.2. Increment BX by 2 so it points to the next word. 

14. LOOP ADD-LOOP. Decrements the count register and jump back 
up to the top of ADD-LOOP if we haven't run the count down 
to zero. 

15. MOV Dl-.tBP]+a. Move the address of SUM into DI. 

16. MOV [DI]-. AX. Move the contents of AX into the address pointed 
at by the DI register, that is, into SUM. 

17. POP BP. Now restore the original value of BP. Also, add two 
to SP. 

18. RET b. Set the Instruction Pointer to point to NEXT-LOCA¬ 
TION and set CS equal to CSEG, in the process add four to 
SP. Add the optional pop value to SP. Now SP equals 
STACK-AREA + 200, as it did before the subroutine was called. 
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19-21. SUB ENDP and csEfi ENDS and end. Tell the assembler to close 
up the procedure, segment, and program. 

In coding the subroutine, a pattern appears. 

• If the subroutine is called with "n" arguments, then the address of 
argument "i" is stored in [BP]-l-6-t-2*(n-i). In other words, the 
right-most argument has its address stored at [BP]+ 6; the right- 
most-but-one is at [BP] -I- 8; one further to the left is at [BP] +10; and 
so forth. (These addresses are valid after we set BP, as in lines 5 and 
6, with a PUSH and a MOV.) 

• It takes one instruction to retrieve the address of the argument; two 
to retrieve the argument's value. 

• The last instruction should be RET 2*n, where n is the number of 
arguments. 


Subroutine Relocation and Segment 
Addressing 

The BASIC command BLOAD allows us to load a subroutine at any 
memory location. It is therefore highly desirable that our 8087 routines 
be dynamically relocatable. We can run into difficulty if the segment ad¬ 
dresses at which a routine is initially loaded (see "Loading A Subroutine 
into Interpreted BASIC") differ from those at which we later BLOAD the 
routine. Dynamic relocation is automatic for programs which do not 
explicitly reference segment locations, but is somewhat more complicated 
otherwise. 

For the purposes of this discussion, suppose we had initially loaded 
SUB with DEF SEG = &H1800 and then BSAVED it from this location 
with an offset of zero. 

Suppose we now load SUB back in at DEF SEG = &H1900. When BASIC 
calls SUB, it sets the code segment register to &H1900 and the instruction 
pointer to zero. Execution procedes correctly. 

Suppose instead that we load SUB at DEF SEG = &H1900 and offset 
125. SUB "thinks" it will find the first instruction at offset zero in the 
code segment. Actually, the first instruction is at offset 125. However, 
when we call SUB we specify the offset. BASIC sets the instruction pointer 
to 125. All the instructions we have used, though not every instruction 
the 8088 knows, operates relative to the instruction pointer. SUB stiU 
executes correctly. 

SUB is fully relocatable. What sort of subroutine isn't? Unfortunately, 
any subroutine that explicitly contains a value for a segment register is 
not relocatable, since the segment may end up at some other memory 
location than the one originally specified. This is particularly a problem 
when we define a data, extra, or stack segment inside a routine. 
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Consider the following, not very useful routine. 


EXTRA_SEG SEGMENT ’DATA* 



FOOLISH DU 



tE 

EXTRA_SEG ENDS 




^SUBROUTINE SILLY(JUNK5i) 



PUBLIC 

SILLY 



ASSUME 

CSrCSEGiES: 

EXTRA_SEG 

^5 

CSEG SEGMENT ’CODE’ 


n L 

SILLY PROC 

FAR 



PUSH 

BP 



MOVE 

BPiSP 



PUSH 

ES 

tPOINT ES 

\IU 

MOVE 

AXiEXTRA-SEG 

i AT 

ill 

MOVE 

ES-.AX 

^EXTRA SEGMENT 

ilE 

MOVE 

AX-iFOOLISH 

^RETURN UHATEVER 

il3 

MOVE 

DI-.[BPl+fci 

^NUMBER UAS 

ilM 

MOVE 

[DIl-iAX 

aYING AROUND 

ilS 

POP 

ES 


i IL 

POP 

BP 


il7 

RET 

B 


ilfi 

SILLY ENDP 



in 

CSEG ENDS 



iEO 

END 



iEl 


This subroutine references the extra segment (if not to any good pur¬ 
pose). Instructions 1-9 and 14-21 are standard. Lines 10, 11, -and 12 save 
ES on the stack and then load the address of EXTRA_SEG into ES. Line 
13 copies FOOLISH. (Note that the assembler should be smart enough 
to use ES to reference FOOLISH.) Subroutine SILLY will work if loaded 
and used at one location, since the loader will figure out the value for 
EXTRA_SEG. However, if we relocate SILLY, EXTRA_SEG will no longer 
be at its original location, and unpredictable consequences may ensue. 

We can make SILLY relocatable by having the subroutine figure out 
for itself how far it's been moved from its original location. The subroutine 
"thinks" it begins at location 16*CSEG. In truth, when BLOADed by 
interpreted BASIC, SILLY begins at 16*DEF SEG + offset. Similarly, the 
subroutine thinks the extra segment begins at 16*EXTRA_SEG, while it 
actually begins at 16*EXTRA_SEG -I- (16=^DEF SEG + offset - 16*CSEG). 
We can use this relation to correctly load segment registers. Life is com¬ 
plicated a slight bit more because the only way to find "offset" is by 
examining the value of the instruction pointer at entry. 

The following subroutine, SMART, will work correctly, as long as the 
code segment and extra segment are loaded together at a memory location that 
is an even multiple of 16. 
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EXTRA_SEG SEGI1ENT 'DATA' -,1 

FOOLISH Dll) f 

EXTRA_SEG ENDS 

^SUBROUTINE SHARTIJUNK^:) 

PUBLIC SMART 

ASSUME CS:CSEG-iES:EXTRA_SEG iS 

CSEG SEGMENT ’CODE’ 

FIRST-INST EflU THIS UORD ;? 

SMART PROC FAR -,8 

PUSH BP 

MOV BPiSP \IQ 

PUSH ES 

CALL NEXT M2 

NEXT: POP AX M3 

SUB AXilOFFSET NEXT)-(OFFSET FIRST_INST) MM 

MOV CLiM MS 

SHR AXiCL Mb 

MOV BXnCS M7 

ADD BXiEXTRA-SEG Mfl 

SUB BXiCSEG Ml 

ADD AX.BX -,20 

MOV ES,AX ;21 

MOV AX-iFOOLISH -,22 

MOV DI-i[BP]+b ^23 

MOV [DIJ-.AX ^2M 

POP ES ^25 

POP BP ;2b 

RET S ^27 

SMART ENDP ;2fl 

CSEG ENDS ;2^ 

END MO 


Lines 1-6, 8-10, 22-24, and 27-30 are standard. 

7. FIRST-INST E(3U THIS UORD. Define the location of the first in¬ 
struction in the code segment to be FIRST-INST. (FIRSTJINST 
equals zero here.) 

11. PUSH ES. Save ES on the stack. Note we don't change BP so 
argument references don't change. 

12-13. CALL NEXT and NEXT: POP AX. This is a devious way to retrieve 
the instruction pointer. CALL pushs IP onto the stack. (The 
instruction pointer will point to the true offset of NEXT, no 
matter where the routine is located.) POP pops the stack into 
AX. Now AX holds the true offset of NEXT. 

14. SUB AXilOFFSET NEXT)-(0FFSET FIRST-INST). Now we sub¬ 
tract the expected offset of NEXT from the true offset. AX now 
holds the number of bytes by which the offset of SMART has 
changed as compared to the position at which it was originally 
loaded. 
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15-16. nov CL M and SHR AX iCL. Divide AX by 16 since we are going 
to set a segment register. Notice that if the program was relocated 
by any number other than an even multiple of 16, the program will 
bomb in an unpredictable manner. Nor will any other method 
work, since the 8088 requires segments to be placed at ad¬ 
dresses that are even multiples of 16. 

17-19. nov BXiCS and add bx-.extra_seg and sub bXiCSEG. Figure 
out how far the code segment has been displaced from its 
original location and how far the extra segment is from the code 
segment. 

20-21. ADD AX-iBX and nov ES-iAX. Combine the offset and segment 
correction and set ES. 

25. POP ES. Restore ES before leaving the routine. 

While all this manipulation is a bit of a nuisance, it is worth the extra 
trouble to be able to more easily load subroutines into BASIC. If you only 
use a compiler, then relocation is handled by the LINK program and this 
extra code is unnecessary. 


Loading Assembly Language Programs 

At the end of the chapter, we show two complete interactive sessions in 
which SMART is used in a BASIC program: one session for the interpreted 
BASIC built into the IBM Personal Computer and one session for IBM's 
BASIC compiler. The remainder of this chapter describes the general 
steps involved. These procedures focus more specifically than most of 
the material in the book on the IBM Personal Computer running PC- 
DOS. If you have a different machine or different software (especially if 
you are not using Microsoft software), you may have to adjust these 
procedures somewhat. 


Loading a Routine Into Interpreted BASIC 

The assembler transforms an 8087/8088 source program into an object 
module. Several further steps are required to get the routine into a form 
suitable for BLOADing into BASIC. These steps involve running the 
program through the LINKer, through DEBUG, and finally through BASIC. 
Suppose we begin with a program held in file FOO. ASM. 

The ASSEMBLER replaces the instructions and (most) addresses with 
their binary representation and creates a file FOO.OBJ. 

LINK is able to combine several different object files. It creates FOO.EXE. 

We use DEBUG to load FOO.EXE. DEBUG figures out the actual mem¬ 
ory address at which each segment begins. We can also ask DEBUG to 
tell us where the program begins. 
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Finally, we use BASIC to BSAVE the routine. Once the routine is 
BSAVED, we can BLOAD it whenever desired. 

The exact procedure for getting from FOO. ASM to the BSAVEd version 
is described in Appendix C of the IBM PC BASIC manual. (The descrip¬ 
tions of LINK and DEBUG in the DOS manual supply some additional 
information.) The exact procedure may vary according to which version 
of DOS and BASIC you use. The steps described below usually work for 
the author. 

1. Assemble FOO.ASM. (Be warned that the assembler occasionally 
produces erroneous error messages.) 

2. Link FOO.OBJ. Tell the linker to load HIGH (LOW is the default). 
Get a MAP file from LINK so that you can find the total length of 
the output file, FOO.EXE. If FOO doesn't have a stack segment, 
LINK will report its absence as an error. Ignore this message. 

3. Enter the DEBUGer with DEBUG BASIC.COM. 

4. Type "r" to examine the registers. Copy down the values of CS, 
SS, IP, and SP. 

5. Enter "N FOO.EXE". Type "L". This tells DEBUG to load your 
routine. 

6. Type "r" again. Copy down the new values of CS and IP. 

7. Restore SS and SP by using the "r" command. Enter "RSS". The 
computer will tell you the current value of SS. Respond by entering 
the value of SS you copied down in step 4. Now enter "RSP" and 
respond to the computer with the value of SP from step 4. 

8. Enter "g=CS:IP" where CS and IP are replaced by the values copied 
down in step 4. 

9. BASIC should start up now, possibly with an irrelevant warning 
about a DIRECT STATEMENT IN FILE. Execute DEF SEG = cs, where 
CS is the value of CS copied down in step 6. Execute a BSAVE 
filespec,offset,length command; where filespec gives the name of 
the file in which you wish to save the routine, offset is the value 
of IP from step 6, and length is the length in bytes of FOO. 

From now on, to use FOO from BASIC just do a "DEF SEG = " and 
"BLOAD filespec". 

Loading a Routine into Compiied BASiC 

Combining an assembled program with the output of the PC-BASIC 
compiler is considerably easier than loading the program into interpreted 
BASIC. 

1. Assemble FOO. ASM. Include subroutine names in a PUBLIC state¬ 
ment. 

2. Compile the BASIC program. Omit DEF SEG and BLOAD state¬ 
ments. You need not worry about the location of the subroutine in 
memory. 
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3. LINK the output of the BASIC compiler together with FOO. 

4. Execute the ".EXE" module. 


Interactive Session for Interpreted BASIC 

Assume that the routine SMART is in file FOO.ASM on disk B:. The 
following BASIC program is in a file USEFOO.BAS, also on disk B:. 

ID DEF SEG=&HlfiDQ 
SO BLOAD "B:F00.SAV"-.0 
3D SnART>' = D 
MO F00LISH/C = ^^=n 
50 PRINT foolish:' 
bO CALL SnART:'(F00LISH:') 

70 PRINT foolish:' 
ao END 


A sample interactive session for loading FOO into interpreted BASIC 
follows. Your responses have been underlined. 

B>A; nAS[1^ 

THE IBM PERSONAL COMPUTER MACRO ASSEMBLER 
VERSION 1-00 (OCOPYRICHT IBM CORP nai 


DOOM Ea 0007 
•nlB 

ERROR 


CALL NEXT 


bM:NEAR JMP/CALL TO DIFFERENT CS 



WARNING SEVERE 
ERRORS ERRORS 
0 1 
B>A:LINK 


IBM PERSONAL COMPUTER LINKER 
VERSION 1-10 (C)COPYRIGHT IBM CORP naS 



OBJECT MODULES [-OBJ] :FOO/HIGH/MAP 
RUN RILE [FOO.EXE]: 

LIST FILE [NUL.MAPj: FOO; 

WARNING: NO STACK SEGMENT 

THERE WAS 1 ERROR DETECTED. 

B> TYPE FOO-MAP 
LOADING HIGH 

WARNING: NO STACK SEGMENT 
JCAKL STOP LENGTH NAME 
oddddID ODDBAH OOBBH CSEG 
DDOaOH C0DD3^1 h) OOOBH EXTRA_SEG 










jU/ 




class’ 

CODE 

DAT^ 


ADDRESS 

(0000:OODo3 


PUBLICS BY NAME 




SMART 


C$£G 
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ADDRESS 


PUBLICS BY VALUE 


0000:0000 


SMART 


B> A;DEBUG AtBASIC-COM 
-R 

Ax=aDao Bx=Dooa cx^araQ Dx=QQaa sPj (FFF^ bP"0000 si°oooo disdood 

l>S=0'=i0S ES=Q1DS ss^ ^nos) CS ^P^ /IP- ^^ NV UP DI PL NZ NA PO NC 
0105:0100 EiEaaa /'’ jtip^<.''''''r'iEb 7 


- N FOQ.EXE 

-L 

-R 




AX=FFM7 BX=0000 r^tX^OOBO »X»0000 / SP-OOOD BP-OOOO SI-DDOO DI>aOOO 

DS=010S ES=010Srt'SS=MF'm (£S=MF^! IP-DODD NV UP DI PL NZ NA PO NC 

_ __ /II _ T ' 


MFIMrOOOO 55 jf, 
- RSS /// 

SS MFIM y,// 
:105<-''^ / 

- RSP // 

SP 0000 // 

-G^^QgidiQOlC^ 


/BP / 

i 7 

I y 


The acAjUvt 


DIRECT statement/in FILe")- 


DEF SEG=4HMF1Mf '^ 

OK 

BSAVE "F00.SAV"-.0-.&H31 
OK 

SYSTEM 

PROGRAM TERMINATED NORMALLY 


I ac/vu^ tie6A4^ kl/it 


B> A:BASIC- ^— 

THE IBM PERSONAL COMPUTER BASIC 
VERSION Dl.lO COPYRIGHT IBM CORP. llfll-i 1182 
bl371 BYTES FREE 
OK 

LOAD "USEFOO" 

OK 

RUN 

1111 

0 

OK 


Interactive Session For Compiled BASIC 

Assume that the routine SMART is in file FOO.ASM on disk B:. The 
following BASIC program is in a file USEFOO.BAS, also on disk B:. 

10 FOOLISH>'==mT 
20 PRINT FOOLISH’/C 
30 CALL SriART(FOOLISH>') 



MO PRINT foolish:^ 
SO END 
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A sample interactive session for loading FOO into compiled BASIC fol¬ 
lows. Your responses have been underlined. 

B> A:l1ASnn 

THE IBM PERSONAL COMPUTER MACRO ASSEMBLER 
VERSION l.QO (C)COPYRIGHT IBM CORP nfll 

□ DOM Efl □□□? R CALL NEXT 

ME 

ERROR —- m:NEAR JMP/CALL TO DIFFERENT 

WARNING SEVERE 
ERRORS ERRORS 
□ 1 

B> A:BASC0M USEFOOn ^ - - CjU^h^jL 

IBM PERSONAL COMPUTER BASIC COMPILER 

(OCOPYRIGHT IBM CORP nflB VERSION l-DD 
(C)COPYRIGHT MICROSOFT*. INC- 

BBISI BYTES AVAILABLE 
EEDBE BYTES FREE 

□ WARNING ERROR(S) 

D SEVERE ERROR(S) 

B> A:LINK USEF00->-F00-. 

IBM PERSONAL COMPUTER LINKER 

VERSION 1.10 (C)COPYRIGHT IBM CORP nflE 

B>USEFOO 


Uu. 

CS jUVUct 


□ 




Simple 8087 Routines 


Several fairly simple 8087 routines are presented in this chapter. The 
purpose of the presentation is twofold. First, the routines themselves are 
quite useful. For example, our first program can be called from BASIC 
to add up a series of numbers. Second, we illustrate a number of prin¬ 
ciples of 8087 subroutine programming including: 

• Indexing through a single array. 

• Using single precision and double precision arithmetic. 

• Indexing through multiple arrays. 


The Cookbook—Chapter 9 


Program: 

Purpose: 

Call: 

Input: 

Output: 

Language: 

Program: 

Purpose: 

Call: 

Input: 

Output: 

Language: 

Program: 

Purpose: 

Call: 

Input: 


SUM 

Sums up a single precision array. 

CALL SUM(ARRAY(0),N,DSUM). 

ARRAY—single precision array. 

N—integer number of elements of ARRAY. 

DSUM—double precision sum of ARRAY. 
8087/8088 assembly language. 

PRODUCT 

Product of elements of a single precision array. 
CALL PRODUCT(ARRAY(0),N,DPRODUCT). 
ARRAY—single precision array. 

N—integer number of elements of ARRAY. 
DPRODUCT—double precision product of ARRAY. 
8087/8088 assembly language. 

GSUM 

Sums up an integer, single, or double precision ar¬ 
ray. 

CALL GSUM(ARRAY(0),TYPE,N,SUM). 

ARRAY—array to be summed. 

TYPE—integer variable giving the length of one ele¬ 
ment of ARRAY 


83 
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Output: 

N—integer number of elements of ARRAY. 

SUM—double precision sum of ARRAY. 

Language: 

8087/8088 assembly language. 

Program: 

VADD 

Purpose: 

Adds two single precision vectors. 

Call: 

CALL VADD(A(0),B(0),C(0),N). 

Input: 

A—input array. 

Output: 

B—input array. 

N—integer number of elements of A,B/C. 

C—output array, C = A + B. 

Language: 

8087/8088 assembly, language. 

Program: 

VADD3 

Purpose: 

Adds three single precision vectors. 

Call: 

CALL VADD3(A(0),B(0),C(0),D(0),N). 

Input: 

A—input array. 

Output: 

B—input array. 

C—input array. 

N—integer number of elements of A,B,C,D. 

D—output array, C = A + B + D. 

Language: 

8087/8088 assembly language. 

Program: 

VSET 

Purpose: 

Sets array to a constant. 

Call: 

CALL VSET(A(0),SCALAR,N). 

Input: 

SCALAR—single precision constant. 

Output: 

N—integer number of elements of A. 

A—output array, A = SCALAR. 

Language: 

8087/8088 assembly language. 

Program: 

ADDSC 

Purpose: 

Adds scalar to single precision array. 

Call: 

CALL ADDSC(A(0),SCALAR,B(0),N). 

Input: 

A—input array. 

Output: 

SCALAR—single precision constant. 

N—integer number of elements of A. 

B—output array, B = A + SCALAR. 

Language: 

8087/8088 assembly language. 

Program: 

SQRT 

Purpose: 

Takes square root of vector. 

Call: 

CALL SQRT(A(0),B(0),N). 

Input: 

A—input array. 

Output: 

N—integer number of elements of A,B. 

B—output array, B = SQR(A). 

Language: 

8087/8088 assembly language. 

Program: 

GCOPY 

Purpose: 

Copies integer, single, or double precision array 

Call: 

CALL GCOPY(A(0),B(0),TYPE,N). 
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Input: 

Output; 

Language: 

Program: 

Purpose: 

Call: 

Input: 

Output: 

Language; 

Program: 

Purpose: 

Call: 

Input: 


Output; 

Language: 

Program: 

Purpose: 

Call: 

Input: 


Output: 


Language: 

Program: 

Purpose: 

Call: 

Input: 


A—input array. 

TYPE—integer giving length of element of A. 

N—integer number of elements of A,B- 
B—output array, B = A. 

8087/8088 assembly language. 

GBCOPY 

Copies integer, single, or double precision array, 
backwards. 

CALL GBCOPY(A(0),B(0),TYPE,N). 

A—input array. 

TYPE—integer giving length of element of A. 

N—integer number of elements of A,B. 

B—output array, B = A. 

8088 assembly language. 

GADDSAFR 

Adds two vectors, with error checking. 

CALL GADDSAFR(A(0),B(0)',C(0),TYPEA,TYPEB, 
TYPEC,N,IER). 

A—input array. 

B—input array. 

TYPE A—integer giving length of element of A. 
TYPEB—integer giving length of element of B. 
TYPEC—integer giving length of element of C. 

N—integer number of elements of A,B,C. 

C—output array, C = A + B. 
lER—integer error indicator. 

8087/8088 assembly language. 

REALERR 

Check array for invalid data. 

CALL REALERR(ARRAY(0),TYPE,N,IFDEN,IFINF, 
IFNAN,ELEMENT). 

ARRAY—input array (single or double precision). 
TYPE—integer giving length of element of ARRAY. 
N—integer number of elements of ARRAY. 

IFDEN—integer (-1 if denormal found). 

IFINF—integer (-1 if infinity found). 

IFNAN—integer ( — 1 if Not-A-Normal found). 
ELEMENT—integer, index of last invalid data. 
8087/8088 assembly language. 

DENTOO 

Replace denormal values with zero. 

CALL DENTO0(ARRAY(0),TYPE,N). 

ARRAY—input array (single or double precision). 
TYPE—integer giving length of element of ARRAY. 
N—integer number of elements of ARRAY. 
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Output: ARRAY—denormals replaced with 0. 

Language: 8087/8088 assembly language. 


Array Indexing 

Our first 8087 subroutine sums a series of numbers and returns their 
total. Assuming that a single precision array named ARRAY, dimen¬ 
sioned ARRAY(N —1), has been defined elsewhere, that N has been set 
equal to the number of elements of ARRAY, and that DSUM is a double 
precision variable, a BASIC instruction sequence might look like this: 

IQ Dsun=0 

50 FOR I=D TO N-1 
30 DSUn=DSUn+ARRAY(I) 

MO NEXT I 

An equivalent 8087 routine appears below. Assuming the routine has 
been loaded into memory at the location SUM, we could call it from 
BASIC with the instruction 

10 CALL SUI1(ARRAY(0)-.N-.I>SUI1) 

The 8087 routine has three logical sections. First, it must accept the 
information passed to it from BASIC. Secondly, the routine calculates 
the sum of the array elements. Third, the answer is passed back into the 
BASIC variable DSUM. Notice that we execute an FWAIT before return¬ 
ing from the subroutine. The FWAIT guarantees that the sum will have 
reached memory before the calling program attempts to access it. 

^SUBROUTINE SUI1(ARRAY-.NiDSUI1) 

; ASSUMPTIONS: ARRAY IS A SINGLE PRECISION ARRAY OF LENGTH N 


T 

N IS 

AN INTEGER 


T 

1 

Dsun 

IS DOUBLE 

PRECISION 


PUBLIC 

SUM 


CSEG 

SEGMENT 

•CODE’ 



ASSUME 

CSiCSEG 


sun 

PROC 

FAR 



PUSH 

BP 



MOV 

BPnSP 



MOV 

BX-.[BP1+1D 

nBX = ADDR(ARRAY) 


MOV 

SlM[BP]+fl 

;SI = ADDR(N) 


MOV 

CX-,[SI] 

•.CX = N 

•tNOIiI all 

SET UP TO 

GO 



FLDZ 


INITIALIZE ST = D 


CMP 

CX.OH 

tHOPE N > □ 


JLE 

DONE 



;THE next 3 INSTRUCTIONS DO ALL THE HARD WORK 
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ADD-LOOP: 

FADD 

DWORD PTR 

[BX] 

;duord=> single 

PRECISION 



ADD 

BX-.M 


iREADY FOR NEXT 
ELEMENT 



LOOP 

ADD-LOOP 




DONE: 

•iNOU FILE 

ANSUER 

BACK IN DSUn 





MOV 

DI-.[BP]+fc, 


:;DI=ADDR(DSUn) 



FSTP 

(3U0RD PTR 

[I>I] 

i(3lil0RD=> DOUBLE 
PRECISION 

SDSUM IS NOW PUT 

AWAY 


POP 

FUAIT 

BP 


nBE SURE flOfl? IS 

DONE 


RET 

(a 




sun 

ENDP 





CSEG 

ENDS 

END 






How long does the addition routine take? Essentially, all the execution 
time is in the three-instruction "ADD_LOOP." (Calling the subroutine 
from BASIC, and the beginning and end of the routine obviously takes 
a little time. But this overhead time is inconsequential for large N.) 

The FADD instruction takes approximately 25 microseconds. The ADD 
instruction uses approximately 1 microsecond. The LOOP instruction 
requires about 4 microseconds. Thus the routine should take about 30 
microseconds per array element. Right? 

Wrong, actually. The 8087 and 8088 run in parallel. So, while the 8087 
is adding one number, the 8088 is adding to the BX register, decrementing 
the count in register CX, testing CX, and looping back up. Hence, the 
routine takes about 25 microseconds per element. Adding 10,000 single 
precision numbers takes just under one-fourth of a second. How long 
would a comparable BASIC routine take? Without the 8087, about 46 
seconds. 

In addition to the speed advantage, the 8087 produces a more accurate 
answer because it accumulates in 80-bit temporary real format rather than 
64-bit double precision. 

SUM is quite a useful subroutine. Of more general importance, SUM 
illustrates how to write a routine that indexes through a single array. We 
use a three-part trick. First we load the address of the array into a con¬ 
venient base or index register (we could have used SI or DI instead of 
BX) and the count into the CX register. Second, we add four to BX (and 
so forth) at each step. Third, we use the LOOP instruction to count off 
the steps. 

Operations other than addition are easily written using the same pro¬ 
cedure. For example, to take the product of an array of numbers we could 
do: 
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ID DPR0DUCT=1 
20 FOR 1=0 TO N-1 
30 »PRODUCT=DPRO])UCT*ARRAY(I) 
MO NEXT I 

or: 


iSUBROUTINE PRODUCT(ARRAY-,N-.I)PRODUCT) 

^ ASSUMPTIONS: ARRAY IS A SINGLE PRECISION ARRAY OF LENGTH N 
i N IS AN INTEGER 

; DPRODUCT IS DOUBLE PRECISION 



PUBLIC 

PRODUCT 


CSEG 

SEGMENT 

•CODE* 



ASSUME 

CS:CSEG 


PRODUCT 

PROC 

FAR 



PUSH 

BP 



MOV 

BPiSP 



MOV 

BXi[BP]+lD 

:;BX = ADDR(ARRAY) 


MOV 

SI-.[BP]+fl 

nSI=ADDR(N) 


MOV 

CX-.[SI] 

nCX = N 

nNOlil ALL 

SET UP TO 

GO 



FLDl 


INITIALIZE ST=1 




nFLDl PUSHES A 1-. JUST AS 




:,FLDZ PUSHES A □ 


CMP 

CX.OH 

tHOPE N > 0 


JLE 

DONE 

nIF NOT-. RETURN 1 

nULT_L00P 

: FMUL 

DWORD PTR [BX] 



ADD 

BXnM 

nREADY FOR NEXT ELEMENT 


LOOP 

MULT-LOOP 


DONE: 




T 

nNOli) FILE 

ANSWER BACK IN PRODUCT 



MOV 

DI-i[BP)+b 

^DI=ADDR(DPR0DUCT) 


FSTP 

(SWORD PTR [DI] 

^PRODUCT IS NOW PUT AWAY 


POP 

BP 



FliiAIT 


nBE SURE aoa? IS DONE 


RET 

L 


PRODUCT 

ENDP 



CSEG 

ENDS 




END 




The FMUL instruction takes approximately 29 microseconds. Multi¬ 
plying 10,000 single precision numbers takes just over one-fourth of a 
second. A comparable BASIC routine takes about 56 seconds. Accuracy 
of the 8087 PRODUCT subroutine will, under some circumstances, con¬ 
siderably exceed the accuracy of the equivalent BASIC code. The 8087 
temporary real exponent allows a much greater range than the double 
precision exponent, so intermediate overflows or underflows are much 
less likely to occur with the 8087 routine. 
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Double Precision Arguments 

The choice between single precision and double precision arithmetic re¬ 
quires a tradeoff between accuracy and memory space. Double precision 
numbers take up twice as much space as single precision numbers, but 
are somewhat more than twice as accurate. Good numerical programming 
practice dictates using double precision throughout. Unfortunately, be¬ 
cause of storage limitations this is rarely practical. In fact, there is a “folk 
theorem" to the effect that problem size expands to use up all available 
space. The following stages of compromise are recommended: 

1. If the problem can be done entirely in double precision, do it that 
way. 

2. Hold raw input and final results in single precision—everything 
else in double precision. There is little loss to storing original input 
in single precision—real data can rarely be measured with the seven 
significant digits provided for by single precision storage. The prob¬ 
lem with single precision is the loss of accuracy from cumulative 
errors. Doing all the calculations in double precision is almost as 
good as holding everything in double precision. 

3. Retain critical intermediate steps in double precision. Delay con¬ 
version into single precision as long as possible. 

Of course, the 8087's 80-bit temporary real format is even more accurate 
than double precision. The most accurate answers are found by doing 
as many intermediate calculations as possible within the 8087, storing 
only final results in memory. 

In practice, programs use both single and double precision. One ad¬ 
vantage of BASIC is that programs “know" whether variables are single 
or double precision. Our 8087 routines need to be told. There are two 
ways, both valuable, to “tell" our routines what precision to use. First, 
we can write separate routines, one for single and one for double pre¬ 
cision. Second, we can write routines which handle both cases and in¬ 
clude an extra argument to tell the routine which type of data is being 
used. The first is easier to write, but the flexibility of the second is some¬ 
times worth the extra effort. 

Changing a single precision routine to double precision requires only 
two simple steps: change the 8087 instructions to reference double pre¬ 
cision memory, and change the step size to eight rather than four bytes. 
Thus, we can change subroutine SUM into a double precision subroutine 
DSUM with the following amendments: 

FADD (JlilORD PTR [BX] instead of FADD DWORD PTR [BX] 

and . 

ADD BXifl instead of ADD BX-,4 

The second approach to the problem of variable precision is to pass 
the needed information on to the subroutine. As long as we're solving 
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this problem, we may as well make things a bit more general. Subroutine 
GSUM accepts a single precision, double precision, or integer vector. 

; SUBROUTINE GSUri(ARRAY*.TYPE-.N-.SUN) 

; ASSUMPTIONS: ARRAY IS AN ARRAY OF LENGTH N 
; TYPE IS AN INTEGER: 2-INTEGER M-SINGLE fl- 

DOUBLE 

T N IS AN INTEGER 

^ sun IS DOUBLE PRECISION 

PUBLIC GSUn 

CSEG SEGMENT'CODE* 

ASSUME CS:CSEG 

GSUM PROC FAR 

PUSH BP 

MOV BP-.SP 

MOV BX-.[BP]+12 ;BX = ADDR(ARRAY) 

MOV SI-,[BP]+ia nSI=ADDR(TYPE) 

MOV AXiISIl ^AX=TYPE 

MOV SI-.[BPl+fl nSI=ADDR(N) 

MOV CX-.ISI] ;CX=N 

iNOli) ALL SET UP TO GO 

FLDZ UNITIALIZE ST=0 

CMP CX-.0H iHOPE N > 0 

JLE DONE 

ADD-LOOP: 

CMP AXnE US IT INTEGERS 

JNE NOT-INTEGER 

FIADD WORD PTR [BX] 

JMP NEXT-ELEMENT 

NOT-INTEGER: 

CMP AX-.M US IT SINGLEf 

JNE NOT-SINGLE 

FADD DWORD PTR [BX] 

JMP NEXT-ELEMENT 

NOT-SINGLE: ABETTER BE DOUBLE 

FADD (SWORD PTR [BX] 

NEXT-ELEMENT: 

ADD BXiAX iREADY FOR NEXT ELEMENT 

LOOP ADD-LOOP 

DONE: 

tNow file answer back in sum 



MOV 

DI-.[BP]+fc, 

nDI 

GET 

ADDRESS 

OF SUM 


FSTP 

(SWORD PTR [DI] 

UUM 

IS 

NOW PUT 

AWAY 


POP 

FWAIT 

BP 

.BE 

SURE 

6067 IS 

DONE 


RET 

& 





GSUM 

ENDP 






CSEG 

ENDS 

END 
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Subroutine GSUM will accept any of the three BASIC numeric variable 
types. GSUM is slightly more complex than SUM and we have to pass 
it one extra argument. It may look like GSUM will also be slower, since 
it has to check TYPE each time through and also jump around extra 
instructions. However, the comparison and jump only takes about five 
microseconds, so the 8088 executes these instructions while the 8087 is 
working on the addition. 

If you have a recent version of BASIC, you can "automate" passing 
the TYPE to GSUM by using the VARPTR$ function. For example: 

DESCRIPTOR$=VARPTR$(ARRAY(D)) 'INTERNAL DESCRIPTION OF ARRAY 

TYPE$=LEFT$(VARPTR$->1) 'FIRST CHARACTER IS TYPE 

TYPE:'=ASC(TYPE$) 'need integer 2-, OR a 

CALL GSUI1'/'(ARRAY(0)-,TYPE>:-.NX-,SUn) 


Indexing Through Multiple Arrays 

In the routines above, we used the BX register to index ARRAY. This 
procedure works with a single array, but more complicated problems 
may require us to keep track of several indexes. Up to three indexes may 
be kept in the registers BX, SI, and DI. In addition, registers AX, DX, 
and CX are convenient for holding temporary values. 

Our next subrouhne adds two single precision vectors, returning a 


single precision vector result. 

^SUBROUTINE VADD(ABCN) 

T ASSUnPTIONS: A-.B-.C ARE SINGLE 
; N IS AN INTEGER 



PUBLIC 

VADD 

CSEG 

SEGMENT 

’CODE’ 


ASSUME 

CS:CSEG 

VADD 

PROC 

FAR 


PUSH 

BP 


MOV 

BP.SP 


MOV 

SI.[BP]+b 


MOV 

cxqsi] 


MOV 

BX-,[BP]+ia 


MOV 

SI-.[BP]+10 


MOV 

DI-.[BP]+fl 

iNOU ALL 

SET UP TO 

GO 


CMP 

CX.OH 

ADD-LOOP: 

JLE 

DONE 


FLD 

DWORD PTR 


ADD 

BX.M 


FADD 

DWORD PTR 


ADD 

SI.M 


PRECISION ARRAYS OF LENGTH N 


iSI = ADl)R{N) 
nCX = N 

nBX = ADI>R(A) 
nSI=AI>I)R(B) 
nDI = ADDR(C) 


nHOPE N > 0 


[BX] nLOAD A(I) 

;ready for next a 

[SI] UDO B(I) 

.READY FOR NEXT ELEMENT 
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FSTP 

DWORD PTR [DI] 

^C(I)=A(I)+B(I) 


ADD 

DI■.^ 

•iREADY FOR NEXT C 


LOOP 

ADD-LOOP 


DONE: 




1 

POP 

BP 



FlilAIT 


^BE SURE flDfl? IS DONE 


RET 

A 


VADD 

ENDP 



CSEG 

ENDS 




END 




Subroutine VADD requires just over half a second to add two 10,000 
long vectors. Note that while we have specified three vectors, nothing 
prevents A and B or A and C, or even all three from being the same 
vector. Thus the command CALL VADD(A(0),A(0),A(0),N) doubles each 
element of A. 

Creating routines to perform subtraction, multiplication, and division 
requires us only to change the 8087 addition instruction to an 8087 sub¬ 
traction, or other type of instruction. Thus we can change one line in 
VADD: 

C=A + B : "FADD DlilORD PTR [SI]" 

to make VSUB: 

C=A-8 : "FSUB DlilORD PTR [SI]" 

or to make VMULT: 

C=A*B : "FHUL DWORD PTR [SI]" 

or to make VDIV: 

C=A/B : "FDIV DlilORD PTR [SI]" 

(Note VMULT and VDIV perform element-by-element operations, not 
"matrix operations.") 

The same technique we used for changing SUM into GSUM can be 
used to change VADD into a routine for single precision or double pre¬ 
cision or integer vector addition. 

After we have more than three vectors, we run out of index registers. 
We can program around this limit through use of the 8088's ability to 
double index. In the next program, the address of each array is loaded 
into BX just before we need to reference the array. The array element is 
indexed in SI. Routine VADD3 adds three single precision vectors and 
returns the result in a fourth. 

^SUBROUTINE VADD3(A-.B-.CiD,N) 

; ASSUMPTIONS: AiB-,C-.D ARE SINGLE PRECISION ARRAYS OF 

LENGTH N 

i N IS AN INTEGER 
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VADD3 

PUBLIC 


CSEG 

SEGMENT 

’CODE’ 



ASSUME 

CS:CSEG 


VADD3 

PROC 

FAR 



PUSH 

BP 



MOV 

BP.SP 



MOV 

SI-,[BP]+b 

nSI=ADDR(N) 


MOV 

CX-,[SI] 

;CX=N 


MOV 

SIiO 

•nSI=0 

nNOlil ALL 

SET UP TO GO 



CMP 

CX-.0H 

SHOPE N > 0 


JLE 

DONE 


ADD-LOOP: 

MOV 

BX-.[BP]+m 

nBX = ADDR(A) 


FLD 

DWORD PTR [BX][SI] 

iLOAD A(I) 


MOV 

BX-,[BP]+12 

•nBX=ADDR(B) 


FADD 

DWORD PTR [BX][SI] 

UDD B(I) 


MOV 

BX-.[BP]+10 

•-.BX = ADDR(C) 


FADD 

DWORD PTR [BXKSI] 

UDD C(I) 


MOV 

BX-,[BP]+fi 

nBX = ADDR(D) 


FSTP 

DWORD PTR [BX][SI] 

•.D(I)=C(I)+A(I)+B(I) 


ADD 

SI.M 

.READY NEXT ELEMENT 


LOOP 

ADD-LOOP 


DONE: 





POP 

FlilAIT 

BP 

•.BE SURE aOfl? IS DONE 


RET 

10 


VADD3 

ENDP 



CSEG 

ENDS 

END 



Scalar Routines 




Mathematical operations frequently involve a scalar and a vector. ("Sca¬ 
lar" is the word mathematicians use for a single number, as opposed to 
an entire vector of numbers.) The simplest example would be setting an 
entire vector to a constant, as in A = 5. Subroutine VSET performs this 
service. VSET first loads the value SCALAR onto the 8087 stack and then 
copies the 8087 register ST into each element of A. 

•^SUBROUTINE VSET(A-,SCALAR-,N) 

i ASSUMPTIONS: A IS A SINGLE PRECISION ARRAY OF LENGTH N 
; SCALAR IS SINGLE PRECISION 

; N IS AN INTEGER 

PUBLIC VSET 

CSEG SEGMENT ’CODE* 

ASSUME CS:CSEG 
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VSET PROC 

FAR 



PUSH 

BP 



MOV 

BP-.SP 



MOV 

SI-,[BP]+ti 


•.SI = ADDR(N) 

MOV 

CX-.[SI] 


nCX = N 

MOV 

BX.[BP]+10 


•.BX = ADDR(A) 

MOV 

SI.[BP]+fl 


;SI=ADDR(SCALAR) 

FLD 

DWORD PTR 

[SI] 

tPush scalar onto stack 

tNou all set up to 

GO 



CMP 

CX-.DH 


nHOPE N > 0 

JLE 

DONE 



VSET-LOOP: 




FST 

DWORD PTR 

[BX] 

•.STORE A(I) 

ADD 

BX-.M 


•.READY FOR NEXT A 

LOOP 

VSET-LOOP 



DONE: 




FSTP 

ST(D) 


:.GET RID OF SCALAR 

POP 

BP 



FUAIT 



^BE SURE aUfl? IS DONE 

RET 

L 



VSET ENDP 




CSEG ENDS 




END 





A typical mathematical operation is to add a scalar to every element of 
a vector. Routine ADDSC performs this function. 

^SUBROUTINE ADDSC(A-.SCALAR-.B-.N) 

n ASSUMPTIONS: A-.B ARE SINGLE PRECISION ARRAYS OF LENGTH N 


SCALAR IS SINGLE PRECISION 
N IS AN INTEGER 


PUBLIC ADDSC 
CSEG SEGMENT ’CODE* 


ASSUME 

CS:CSEG 

ADDSC PROC 

FAR 

PUSH 

BP 

MOV 

BP.SP 

MOV 

SI.[BP]+b 

MOV 

CX.[SI] 

MOV 

BX.[BP]+1B 

MOV 

SI.[BP]+1D 

FLD 

DWORD PTR 

MOV 

SI.[BP]+a 

^iNOW ALL SET UP TO 

GO 

CMP 

CX.DH 

JLE 

DONE 

ADD_LOOP: 


FLD 

DWORD PTR 

ADD 

BX.M 


iSI = ADDR(N) 

^CX=N 

iBX=ADDR(A) 
nSI = ADDR(SCALAR) 

[SI] iPUSH SCALAR ONTO STACK 
iSI=ADDR(B) 


nHOPE N > 0 


[BX] '.LOAD A(I) 

nREADY FOR NEXT A 
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FADD 

STiSTd) 

UDD SCALAR 


FSTP 

DWORD PTR [SI] 

^B(I)=A(I)+SCALAR 


ADD 

SI-,^ 

iREADY FOR NEXT B 


LOOP 

ADD-LOOP 


DONE: 





FSTP 

ST(D) 

^GET RID OF SCALAR 


POP 

BP 



FUAIT 


^BE SURE ADA? IS DONE 


RET 

A 


ADDSC 

ENDP 



CSEG 

ENDS 




END 




Adapting ADDSC for subtraction, multiplication, and division is straight¬ 
forward. (Remember, of course, that "A —SCALAR" is quite different 
from "SCALAR-A!") 

Unary Operations 

Operations requiring only one argument are said to be "unary" (as op¬ 
posed to two-argument "binary" operations such as "A-t-B"). For ex¬ 
ample we might want to find the square root, absolute value, or negative 
of the elements of an array. Routine SQRT, which we used for timing 
examples in Part I (Chapters 1-4), computes B = SQR(A). 

^SUBROUTINE SflRKA.BiN) 

ASSUMPTIONS: AiB ARE SINGLE PRECISION ARRAYS OF LENGTH N 
N IS AN INTEGER 


PUBLIC 

S(2RT 



CSEG ‘ SEGMENT 

•CODE* 



ASSUME 

CS:CSEG 



S(2RT PROC 

FAR 



PUSH 

BP 



MOV 

BP.SP 



MOV 

SI-.[BP]+L 


•.SI = ADDR(N) 

MOV 

CXnlSI] 


nCX = N 

MOV 

BX-.[BP]+1D 


iBX = ADDR(A) 

MOV 

SI-.[BP]+A 


tSI = ADDR(B) 

-.NOW ALL SET UP TO 

GO 



CMP 

CX-.QH 


;hope N > □ 

JLE 

DONE 



S(3RT_L00P: 




FLD 

DWORD PTR 

(BX] 

;L0AD A(I) 

ADD 

BXiM 






;ready for next 

FS(3RT 



nFIND S(3RT(A(I]) 

FSTP 

DWORD PTR 

[SI] 

•iB(I)=S(3RT(A(I)) 

ADD 

SI.M 


iREADY FOR NEXT 

LOOP 

S(3RT-L00P 
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DONE: 

POP BP 

FlilAIT nBE SURE BOfi? IS DONE 

RET t 

S(3RT ENDP 

CSEG ENDS 

END 

Routine SQRT is easily changed to compute absolute value or to yield 
the negative of the input vector by changing FSQRT to FABS or FCHS. 


utility Routines 

The speed of the routines above reflects both the 8087's prodigious math¬ 
ematical ability and the vast speed advantage of 8088 assembly language 
code over BASIC. It can be very useful to use assembly language routines 
even for such "non-computational" tasks as copying one array of num¬ 
bers into another. We can use the 8087's automatic precision conversion 
to allow the transfer between single precision, double precision, and 
integer arrays as a bonus. 

The BASIC code 

10 Din 
ao NX=S0QD 

30 FOR I=D TO N'/C-l 

MD B(I)=A(I) 

SO NEXT I 

takes about 18 seconds or more to execute, even if we rewrite the code 
all on one line, for maximum efficiency (and minimum clarity). We would 
actually be better off with the code 

ID DIM A(H‘m)-,B(M'i‘1'i) 

ao NX=SQ00:SCALAR=D 

30 CALL ADDSC(A(0)-,SCALAR-.BtOJiNJC) 

which would only take about a quarter of a second, despite its 5,000 
useless addition operations! For greater convenience, we create a routine 
GCOPY that not only copies one array into another, but also handles 
type conversions for us. 


;SUBROUTINE GC0PY(A-,B->TYPEA-,TYPEBiN) 

ASSUMPTIONS: A-iB ARE ARRAYS OF LENGTH N 

TYPEA IS AN INTEGER: B-INTEGER H-SINGLE 8- 
DOUBLE 

; TYPEB " 

^ N IS AN INTEGER 


PUBLIC 


GCOPY 
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CSEG SEGMENT 

’CODE’ 


ASSUME 

CS:CSEG 


GCOPY PROC 

FAR 


PUSH 

BP 


MOV 

BP-iSP 


MOV 

SI-.[BP1+1D 

nSI = ADDR(TYPEA) 

MOV 

AX-.ISI] 

UX=TYPEA 

MOV 

SI-.[BP]+a 

tSI=ADDR(TYPEB) 

MOV 

DX-.[SI] 

nDX=TYPEB 

MOV 

SI-.[BP]+b 

nSI = ADDR(N) 

MOV 

CX.[SI] 

;CX = N 

MOV 

BX-,[BP]+m 

iBX = ADDR(A) 

MOV 

si-,[BP]+ia 

iSI = ADDR(B) 

iNOU ALL SET UP TO 

GO 


CMP 

CX-.OH 

iHOPE N > 0 

JLE 

DONE 


1 

COPY-LOOP: 

CMP 

AX-,a 

iIS A INTEGERS 

JNE 

A-NOT_INTEGER 


FILO 

WORD PTR [BX] 


JMP 

STORE-IT 


A_NOT_INTEGER: 

CMP 

AXiM 

;IS A SINGLEf 

JNE 

A-NOT-SINGLE 


FLD 

DWORD PTR [BX] 


JMP 

STORE-IT 


A-NOT_SINGLE: 


^BETTER BE DOUBLE 

FLD 

(3U0RD PTR [BX] 


1 

STORE_IT: 

ADD 

BXiAX 

tREADY for NEXT ELEMENT 

CMP 

DX.a 

as B INTEGERf 

JNE 

B_NOT_INTEGER 


FISTP 

WORD PTR [SI] 


JMP 

LOOP-END 


B_NOT_INTEGER: 

CMP 

DX-iM 

as B SINGLEf 

JNE 

B-NOT-SINGLE 


FSTP 

DWORD PTR [SI] 


JMP 

LOOP-END 


B_NOT_SINGLE: 


^BETTER BE DOUBLE 

FSTP 

(3W0RD PTR [SI] 


LOOP-END: 



ADD 

SI-.DX 

nREADY FOR NEXT ELEMENT 

LOOP 

COPY-LOOP 


DONE: 

POP 

BP 


FUAIT 


nBE SURE aOfl? IS DONE 

RET 

ID 
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GCOPY ENDP 

CSEG ENDS 

END 

GCOPY is about 100 times faster than the equivalent BASIC code. 

Our second utility routine is GBCOPY. GBCOPY is like GCOPY, except 
that it begins copying at A(N —1) and works down to A(0), rather than 
vice versa, and that GBCOPY does not perform type conversions. 

^SUBROUTINE GBCOPY(A-iB-,TYPE-,N) 

•n ASSUMPTIONS: AnB ARE ARRAYS OF LENGTH N 
^ TYPE IS AN INTEGER: 2-INTEGER H-SINGLE fi- 

DOUBLE 

N IS AN INTEGER 



PUBLIC 

GBCOPY 


CSEG 

SEGIIENT 

’CODE’ 



ASSUME 

CS:CSEG 


GBCOPY 

PROC 

FAR 



PUSH 

BP 



MOV 

BPiSP 



MOV 

BX-.[BPl+b 

tBX = ADDR[N] 


MOV 

CX-,[BX] 

•iCX=N 


CMP 

CXiD 



JLE 

DONE 



MOV 

BX-,[BP]+fl 

^BX=ADDR(TYPE) 


MOV 

AX-i[BXl 

iAX=TYPE 


MUL 

CX 

nAX=N*TYPE 


MOV 

BX-.AX 

nBX=N*TYPE 


MOV 

CXiAX 

nCX=N*TYPE 


SHR 

CX-,1 

nCX=N*TYPE/E 




•^(UORDS TO BE MOVED) 


MOV 

SI-.[BP]+ia 

•.SI=ADDR(A) 


MOV 

DI-.[BP]+1D 

nDI=ADDR(B) 

BC0PY_L00P 

: 




SUB 

BX-iE 

tNext index 


MOV 

AX-.[SI1[BX] 

•iGET A 


MOV 

[DIKBX1-.AX 

nSTORE B 


LOOP 

BCOPY_LOOP 


DONE: 

POP 

BP 



RET 

& 


GBCOPY 

ENDP 



CSEG 

ENDS 




END 




GBCOPY illustrates backwards operations on an array. Our first task 
was to locate the last element of each array. If an array element takes 
TYPE bytes to store and the first element begins at location ADDR, then 
the second element begins at location ADDR + TYPE, the third at 
ADDR + 2*TYPE ... and the Nth at ADDR + (N-1)*TYPE. Once these 
locations are found, GBCOPY is like GCOPY except that GBCOPY sub- 
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tracts to move the elements down to where GCOPY adds to move the 
elements up. 

Why move an array backwards anyhow? Consider the following two 
problems. First, copy A(I +1) into A(I) for an entire array. This can be 
done either in BASIC: 

10 FOR 1=0 TO N-2 
20 A(I)=A{I+1) 

30 NEXT I 

or with GCOPY: 

10 Nl^.=N-l:TYPEAX=M 

20 CALL GC0PY(A{1)-,A{0)-.TYPEA5:-.TYPEA;c-,N1V.) 

Second, copy A(I) into A(I +1) for an entire array. One might be tempted 
do this in BASIC with 

10 FOR 1=0 TO N-2 
20 A{I + 1)=A(I) 

30 NEXT I 

but this won't work. On the first step, this puts A(0) into A(l). On the 
next step, when BASIC tries to move A(l), it picks up the value originally 
in A(0). The original value of A(l) has been wiped out. Correct BASIC 
code would be 

10 FOR I=N-2 TO 0 STEP -1 
20 A(I+1)=A(I) 

30 NEXT I 

GCOPY(A(0),A(1),TYPEA%,TYPEA%,N1%) would generate the same 
incorrect results as the first BASIC program. GBCOPY(A(0),A(1), 
TYPEA%,N1%) works correctly. Since GBCOPY's primary use is copying 
data from one part of an array to another part of the same array, nothing 
was lost by omitting the type conversion. 


On Errors 

Errors that might result from using number crunching subroutines can 
be loosely grouped into four classes: 

• Programming errors in the subroutines. 

• Errors in using the subroutines. 

• Recoverable precision errors. 

• Non-recoverable precision errors. 

Programming Errors 

Computer hardware does not make mistakes. (Not often, anyway.) Peo¬ 
ple who program computers do make mistakes. As you develop your 
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own number crunching routines for the 8087, you'll naturally hit an 
occasional bug. Be warned that a personal computer is not quite so for¬ 
giving when programmed at the machine language level as it is when 
programmed in BASIC. 

About the worst that can happen in a BASIC program, aside from 
getting the wrong answer, is that BASIC halts the program and prints a 
somewhat cryptic error message. Usually, BASIC at least tells you what 
line caused the error. 

What's the worst that can happen with an undebugged machine lan¬ 
guage program? Frequently, you CALL a machine language program and 
nothing happens . . . nothing at all happens. The only thing to do is to 
hit the reset key (Ctrl-Alt-Del on an IBM PC) and restart the system from 
scratch. 

Unfortunately, things can be even a bit worse. Sometimes the reset 
key doesn't do anything either. A machine language program can, after 
all, write into any location in memory—including writing garbage into 
areas that only DOS is supposed to use. When this happens, the only 
solution is to power down, leave the machine off for a few seconds, and 
then turn the power back on. It pays to be careful in debugging 808718088 
programs. 

Errors in Using the Subroutines 

Even bug-free routines can go wrong if fed invalid input. As a simple 
example, suppose we feed the wrong value for N to one of the vector 
routines prepared above. It would be nice if the routines would check 
for valid input and return an error indication when given garbage. 

Consider what our routines do instead. If N gives the correct length 
of the data arrays, the routines return the correct answer. Notice that 
special consideration is given to the case of zero length arrays and these 
are handled properly. Suppose we set N to a negative value. The routines 
act as if N were zero, but do not report the error. Suppose instead that 
the arrays are really 100 long, but we mistakenly set N to 50. The routines 
give the wrong answer, but return to BASIC without other errors. Sup¬ 
pose we commit the reverse error, setting N to 100 when the arrays are 
only of length 50. The routines will merrily write into an area of memory 
assigned to something other than the arrays we are supposed to be using. 
If we are lucky, the routine will overwrite something vital and the ma¬ 
chine will stop cold. In this way we will come to suspect there is an error. 
If we are unlucky, the routine will change totally unrelated variables, 
causing our final answers to be wrong without giving any indication of 
a possible problem. 

It is an unfortunate fact of life that there is no sure-fire way to catch 
these kinds of errors in a machine language program, or, for that matter. 
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in many other computer languages. For the routines in this book, we 
have decided to place all error checking responsibility on the BASIC 
programmer. However, it is certainly possible to rewrite the routines to 
catch a few errors. Routine GADDSAFR (General precision ADDition, 
but SAFeR) illustrates one such approach. 

nSUBROUTINE GADDSAFR(A-.B-.C-.TYPEA-.TYPEB-.TYPEC-.N-.IER) 
n ASSUMPTIONS: A-.B-.C ARE ARRAYS OF LENGTH N 

TYPEA IS AN INTEGER: B-INTEGER M-SINGLE fl- 
DOUBLE 

. TYPEB ” 

. TYPEC " 

; N IS AN INTEGER 

n lER IS AN INTEGER RETURNING □ IF NO ERROR 

^ 1 IF N IS NEGATIVE 

^ B IF TYPEA-.TYPEB-.0R TYPEC IS ILLEGAL 


T 

T 

PUBLIC 

GADDSAFR 


CSEG 

SEGMENT 

’CODE’ 



ASSUME 

CS:CSEG 


GADDSAFR 

PROC 

FAR 



PUSH 

BP 



MOV 

BP-,SP 


;CHECK TYPES 




MOV 

SI-.[BP]+m 

iSI = ADDR(TYPEA) 


MOV 

AX-.ISI] 

iAX=TYPEA 


CMP 

AXiB 



JE 

TYPEA-OK 



CMP 

AX■.^ 



JE 

TYPEA-OfC 



CMP 

AX-ifl 



JE 

TYPEA_0K 



JMP 

TYPE-ERROR 


TYPEA^OK: 

MOV 

SI-.[BP]+ia 

^SI=ADDR(TYPEB) 


MOV 

AX-,[SI] 

nAX=TYPEB 


CMP 

AX-,a 



JE 

TYPEB-OK 



CMP 

AXiM 



JE 

TYPEB-OK 



CMP 

AXnfi 



JE 

TYPEB-OK 



JMP 

TYPE-ERROR 


TYPEB_0K: 

MOV 

SI-,[BP]+1D 

:iSI = ADDR(TYPEC) 


MOV 

AX-,[SI] 

iAX=TYPEC 


CMP 

AX-,2 



JE 

TYPEC-OK 



CMP 

AXiM 



JE 

TYPEC-OK 



CMP 

AX-.fi 
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JE 

TYPEC-OK 


JHP 

TYPE-ERROR 


TYPEC_0K: JflP 

CHECK-N 


TYPE-ERROR: 

nov 

AXiB 


JflP 

DONE 


CHECK_N: 

nov 

SI-,[BP]+a 

nSI = ADDR(N) 

nov 

CX-.[SI] 

nCX=N 

cnp 

CXiDH 


JNE 

Lll 

tDone too far for 

jnp 

DONE 

^DIRECT JE 

Lll: J6 

START-ADD 




LOOPS'. N>D 

nov 

AXil 


jnp 

DONE 


1 

START-ADD: 

nov 

AX-,[BP]+BO 

iAX = ADDR(A) 

nov 

DI-.[BP]+lfl 

iDI = ADDR(B] 

nov 

DX.[BP]+lb 

iDX = ADDR(C) 

ADD-LOOP: 

nov 

BX-.AX 

nBX = ADDR(A) 

nov 

SI-.[BP]+m 

^SI = ADDR(TYPEA) 

nov 

sr-.[si] 

^SI=TYPEA 

cnp 

SI-iB 

as IT INTEGERf 

JNE 

A-NOT-INTEGER 


fILD 

WORD PTR [BX] 


jnp 

ADD_B 


A-NOT-INTEGER: 

cnp 

SIn^ 

as IT SINGLEf 

JNE 

A_NOT_SINGLE 


FLD 

DWORD PTR [BX] 


jnp 

ADD-B 


A-NOT-SINGLE: 

FLD 

(3b)0RD PTR [BX] 


ADD-B: ADD 

AXiSI 

tReady for a next TinE 

nov 

SI-.[BP]+ia 

ai = ADDR{TYPEB) 

nov 

SI-,[SI] 

^SI=TYPEB 

cnp 

SIiB 

as IT INTEGERf 

JNE 

B-NOT-INTEGERll 


FIADD 

WORD PTR [DI] 


jnp 

NEXT-C 


B-NOT_INTEGER: 

cnp 

SI-.M 

as IT SINGLEf 

JNE 

B-NOT_SINGLE 


FADD 

DWORD PTR [DI] 


jnp 

NEXT_C 
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B_NOT_SINGLE: 

FADD 

(SlilORD PTR [DI] 


NEXT_C: ADD 

DI-.SI 

^iREADY FOR NEXT B 

MOV 

X 

r 

X 

CD 

nBX = ADDR(C) 

MOV 

SI-.[BP]+10 

;SI = ADDR(TYPEC) 

nov 

SI-.ISI] 

iSI=TYPEC 

CMP 

SI-.2 

as IT INTEGERS 

JNE 

C_N0T_INTEGER 


FISTP 

UORD PTR [BX] 


JHP 

NEXT-ELEMENT 


C_N0T_INTEGER: 



CMP 

SI-.M 

as IT SINGLEf 

JNE 

C-NOT-SINGLE 


FSTP 

DWORD PTR [BX] 


JMP 

NEXT-ELEMENT 


C_N0T_SINGLE: 

FSTP 

(3W0RD PTR [BX] 


T 

NEXT-ELEMENT: 

ADD 

DXiSI 

iREADY FOR NEXT C 

LOOP 

ADD-LOOPER 

aOOP ONLY JUMPS 

MOV 

AXiD 

1E7 . . . 

i -NO ERROR- 

JMP 

DONE 


ADD-LOOPER:JMP 

ADD-LOOP 

a . . BYTES 

DONE: 

MOV 

SI-,[BP]+fc> 

ai = ADDR(IER) 

MOV 

[SI]-.AX 

aER=ERR0R CODE 

POP 

BP 


FUAIT 


;BE sure flOfl? IS DONE 

RET 

lb 


GADDSAFR ENDP 

CSEG ENDS 

END 


Error checking adds only about 20 lines of code and a negligible increase 
in execution time. Unfortunately, many illegal input errors still won't be 
caught. Besides N simply having the incorrect value, any of the arrays 
might actually be of a different type than that stated; the type, N, or lER 
arguments might not be integers; or we might call GADDSAFR with the 
wrong number or order of arguments. 


Precision Errors 

A fact of life that programmers find most difficult to accept is that perfectly 
"correct" programs sometimes give the wrong answer. Computer arithmetic 
has only limited accuracy. The 8087 is more accurate than most main¬ 
frames. Nonetheless, for any finite degree of precision, there exists some 
problem for which the degree of precision is insufficient. The problem 
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is somewhat aggravated by the fact that a program will work perfectly 
with one set of data and not at all with another. With some work, one 
can even construct a series of numbers which add correctly when added 
from first to last but give a nonsensical result when added backwards. 
There are several programming approaches to handling precision errors: 

• Ignore the problem and hope no errors ever occur. 

• Handle each error as soon as it occurs. 

• Set up a general scheme to allow computation to proceed as far as 
possible. 

Ignoring the problem is not quite as silly as it sounds. The 8087 is 
extremely accurate. Furthermore, the 8087 designers have built in au¬ 
tomatic error handling capabilities which operate very sensibly. For most 
problems, precision errors will not occur. For most precision errors that 
do occur, the 8087 error handling will apply the correct solution. 

As an extreme alternative, the 8087 can be set to stop every time an 
error occurs. Exception handling software can be written to take care of 
every error on a problem-specific basis. This approach requires you to 
hand-tailor every subroutine, so it isn't practical for this book. Exception 
handling routines are discussed in the Intel iAPX 86,88 User's Manual. 

In considering a general scheme for error handling, it is constructive 
to review what BASIC does about the problem. Among BASIC's rules 
are the following: 

• Integer overflow generates an error message and halts the program. 

• Real overflow generates an error message. The result is set to ma¬ 
chine infinity. Execution continues. 

• Real underflow causes the result to be set to zero. Execution contin¬ 
ues without a warning message. 

• Passing an invalid argument to a function results in an error message. 
Execution halts. 

The error handling routines in the 8087 hardware always allow exe¬ 
cution to continue, while generally indicating errors by producing an 
answer that is not a "normal'' number. All our routines allow the 8087 
automatic error handling procedures to maintain control. As a result, the 
final answers may include an error indication. We need a routine to check 
whether data is valid or invalid. We would also like to fix those errors 
for which some obvious fix-up exists. Single and double precision output 
of the 8087 take one of the following forms, which were discussed at 
length in Chapter 5: 

• Normal—a valid number. 

• Denormal—indicates a previous underflow. 

• Infinity—may indicate a previous overflow. 

• Not-A-Number (NAN)—invalid datum. 

Routine REALERR accepts an input array of single or double precision 
numbers. It returns three integer variables, each of which is set to -1 
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(true) if any denormal, infinity, or NAN, respectively, is stored in the 
array, or to 0 (false) otherwise. ELEMENT is an integer variable giving 
the element number of the last other than normal number found. 

Routines REALERR and DENTOO use some processor control instruc¬ 
tions defined in Chapter 12. We include these routines here because of 
their usefulness for even simple numerical programs. 

iSUBROUTINE REALERR(ARRAY-,TYPE-.N-.IFDEN-.IFINF-.IFNAN-.ELEMENT) 


i ASSUnPTIONS: ARRAY 

IS OF LENGTH N 


i TYPE 

IS AN INTEGER: M-SINGLEi fl-DOUBLE 

N IS 

AN INTEGER 


T IFDEN 

-.IFINF-.IFNAN ARE INTEGER 

n RETURNING □ (FALSE) 

OR -1 (TRUE) 

ELEMENT IS AN INTEGER 


PUBLIC 

REALERR 


CSEG SEGMENT 

•CODE’ 


ASSUME 

CS:CSEG-,ES:EXTRA 

_SEG 

FIRST-INST E(3U 

THIS WORD 


REALERR PROC 

FAR 


PUSH 

BP 


MOV 

BPnSP 


1 

nSET UP EXTRA SEGMENT TAKING CARE OF 

RELOCATION 

PUSH 

ES 


CALL 

NEXT 


NEXT: POP 

AX 


SUB 

AX.(0FFSET NEXT)- 

(OFFSET FIRST-INST) 

MOV 

CLiM 


SHR 

AXnCL 


MOV 

BXnCS 


ADD 

BXtEXTRA-SEG 


SUB 

BX-.CSEG 


ADO 

AX-.BX 


MOV 

ES-.AX 


n 

iSET TENTATIVE RETURN VALUES TO ZERO 


MOV 

SI-.[BP1+15 

•.CLEAR IFDEN 

MOV 

UORD PTR [SI]-,0 


, MOV 

SI-.[BP]+1D 

^CLEAR IFINF 

MOV 

WORD PTR [SI1-.D 


MOV 

SI-,[BPl+fl 

•.CLEAR IFNAN 

MOV 

WORD PTR [SI]-,D 


MOV 

SI-.[BP]+b 

^CLEAR ELEMENT 

MOV 

WORD PTR [SI1-.D 


MOV 

SIt[BP]+m 


MOV 

CX-.[SI] 

^CX=N 

MOV 

BX-.[BP]+ia 

•.BX=ADDR(ARRAY) 

MOV 

SI-,[BP]+lb 

;SI = ADDR(TYPE) 

MOV 

AX. [SI] 

•.AX = TYPE 
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ALL SET UP TO 

GO 


cnp 

CX-,DH 

iHOPE N > 0 

JL 

OONE 


1 

CHECK-LOOP: 

CMP 

AXiM 

as A SINGLEf 

JNE 

N0T-SIN6LE 


FLO 

OUORO PTR [BX] 


JMP 

CHECK_IT 


NOT-SINGLE: 


ABETTER BE DOUBLE 

FLO 

(3U0R0 PTR [BX] 


1 

CHECK_IT: 

AOO 

BX-iAX 

aEAOY FOR NEXT 
ELEMENT 

FXAN 


lUHAT 010 UE LOAOf 

FSTSU 

STATUS-lilORO 


FSTP 

ST(D) 


FUAIT 

MOV 

OHnBYTE PTR STATUS_b)ORO +1 

ANO 

0H-.01QDD1Q1B 

tBlank other bits 

CMP 

OH.QODDDIDDB 

^NORMALf 

JE 

OK 


CMP 

OH-.Q1DQQDODB 

aEROf 

JE 

OK 


CMP 

0H-.DDDDO1D1B 

aNFINITYf 

JE 

INF 


CMP 

OHtG 

nUNNORMALf 

JE 

OEN 


CMP 

OH.OIDODIDDB 

aENORMALf 

JE 

OEN 


^MUST BE NAN 

MOV 

SI-.[BP]+a 


JMP 

SET-ERROR 


OEN: MOV 

SI-.[BP]+1B 

•.SET IFOEN 

JMP 

SET-ERROR 


INF: MOV 

SI-.[BP]+1D 

aET IFINF 

SET-ERROR: 

MOV 

UORO PTR [Slln-l 

TERROR IS TRUE 

MOV 

SI-,[BP]+m 

;get n back 

MOV 

SIi[SI] 


SUB 

M 

X 


MOV 

OI-.[BP]+fc. 


MOV 

[OIJ-.SI 

iSET ELEMENT 

OK: LOOP 

LOOPER 


JMP 

OONE 


LOOPER: JMP 

CHECK-LOOP 


OONE: 

POP 

ES 


POP 

BP 
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;be sure aoa? is done 


FIJAIT 

RET m 

REALERR ENDP 
CSEG 
ENDS 

EXTRA_SEG SEGMENT ’DATA' 

STATUS-WORD DW f 
EXTRA-SEG ENDS 
END 

REALERR is a little complicated, but is nonetheless quite fast, checking 
an array of 10,000 numbers in about a quarter of a second. 

After execution of REALERR, IFDEN, IFINF, and IFNAN are easily 
tested with BASIC IF statements. The question remains as to what action 
should follow as a result of the test. The following general rules can serve 
as a guide: 

• NAN—halt execution with an error message. 

• Infinity—Halting execution or allowing it to continue depends some¬ 
what on circumstances. Infinity usually indicates a meaningless value, 
resulting from either an overflow or from some sort of invalid op¬ 
eration. However, there are occasionally functions for which infinity 
is a sensible number. Consider evaluating the following function: 

1/(1-I-1/x) 

As X goes to zero, the function goes to zero. Since the 8087 is designed 
to report 1 divided by zero as infinity, 1 plus infinity as infinity, and 
1 divided by infinity as zero, this function will be correctly evaluated, 
if we ignore intermediate infinite results. If X equals -1, then the 
final result will be infinity, as it should be. 

• Denormals are a somewhat different case. A denormal indicates that 
an underflow has occurred. The datum therefore represents a num¬ 
ber very close to zero. We can either leave the number as a denormal, 
in which case the 8087 will continue to treat it as a number very 
close to zero, or we can set the number to true zero. 

Routine DENTOO replaces all the denormals in an array with true zeros. 

^SUBROUTINE DENT0a(ARRAYiTYPE-.N) 

^ ASSUMPTIONS: ARRAY IS OF LENGTH N 

i TYPE IS AN INTEGER: M-SINGLE-, fl-DOUBLE 

; N IS AN INTEGER 

PUBLIC DENTOD 

CSEG SEGMENT 'CODE* 

ASSUME CS:CSEG.ES:EXTRA-SEG 
FIRST-INST ECU THIS WORD 

DENTOD PROC FAR 

PUSH BP 

MOV BP.SP 
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iSET UP EXTRA SEGMENT TAKING CARE OF 

RELOCATION 

PUSH 

ES 


CALL 

NEXT 


NEXT: POP 

AX 


SUB 

AX-,(OFFSET NEXT)- 

(OFFSET FIRST-INST) 

MOV 

CLiM 


SHR 

AX-.CL 


nov 

BX-.CS 


ADO 

BX-.EXTRA_SEG 


SUB 

BXnCSEG 


ADD 

AX-.BX 


nov 

ES-.AX 


nov 

SI-.[BP]+b 


nov 

CX-.[SI] 

nCX = N 

nov 

BX-,[BP]+1D 

;BX = ADDR(A) 

nov 

SI-.[BP]+fl 

;SI=ADDR(TYPE] 

nov 

AX-.[SI] 

UX = TYPE 

nNOlil ALL SET UP TO 

GO 


cnp 

CXtOH 

•.HOPE N > □ 

JL 

DONE 


1 

CHECK-LOOP: 

cnp 

AXnM 

•.IS A SINGLED 

JNE 

NOT-SINGLE 


FLD 

DWORD PTR [BX] 


jnp 

CHECK-IT 


NOT-SINGLE: 


•.BETTER BE DOUBLE 

FLD 

(3U0RD PTR [BX] 


T 

CHECK_IT: 

FXAn 


-.WHAT DID WE LOADf 

FSTSU 

STATUS-WORD 


FSTP 

ST(D) 


FlilAIT 

nov 

DH-.BYTE PTR STATUS_li)ORD + l 

AND 

DHiOlQODliQlB 

•.BLANK OTHER BITS 

cnp 

DH-.D 

^UNNORnALf 

JE 

DEN 


cnp 

DH-iDlDDDlOOB 

^DENORnALf 

JE 

DEN 


jnp 

LOOP-BOTTOn 


DEN: 

FLDZ 


;nAKE A ZERO 

cnp 

AX■,^ 

as A SINGLEf 

JNE 

STILL_NOT_SINGLE 


FSTP 

DWORD PTR [BX] 


jnp 

LOOP-BOTTOn 


STILL-NOT-SINGLE: 


•.BETTER BE DOUBLE 

FSTP 

(3W0RD PTR [BX] 
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LOOP-BOTTOM: 




ADD 

BXnAX 



LOOP 

LOOPER 



JMP 

DONE 


LOOPER: 

JMP 

CHECK-LOOP 


DONE: 

POP 

ES 



POP 

FlilAIT 

BP 

•.BE SURE fiOfi? IS DONE 


RET 

L 


DENTOD 

ENDP 



CSEG 

ENDS 



EXTRA-SEG 

SEGMENT 

•DATA’ 



STATUS-WORD DU ? 

EXTRA-SEG ENDS 
END 

One last warning about ignoring the presence of denormals, infinities, 
and NANs. A few 8087 instructions insist on valid data as input. In 
particular, the transcendental instructions discussed in Chapter 12 will produce 
an undefined result if fed invalid data and will do so without signaling any error 
condition! 

Our error handling has been limited to single and double precision 
reals to the exclusion of integers. There are two reasons for this exclusion. 

First, if you use 16-bit integers, the only kind available in BASIC, for 
holding numerical results, you are asking for trouble. Merely multiplying 
two random integers may result in integer overflow! Floating point arith¬ 
metic is every bit as fast as integer arithmetic on the 8087. Use integer 
variables for subscripts, flags, and subroutine addresses. Otherwise stay 
away. 

Second, the integer data type cannot be set to indicate invalid data in 
the way real variables can be set. If a number cannot be converted to a 
valid integer, the 8087 reports the most negative value, —32,768. Both 
BASIC and the 8087 treat — 32,768 as they do any other integer, so invalid 
data will not be flagged. If integer variables must be used, all results 
should be checked and execution should be stopped if — 32,768 appears. 




Basic Matrix Operations 


Matrix operations occupy the center of the number crunching world. Large 
scale supercomputers, costing tens of millions of dollars, have special 
built-in hardware devoted entirely to fast matrix operations. There are 
even computer languages, such as APL, where the matrix replaces the 
scalar as the fundamental variable type. Matrices are so important that 
some versions of BASIC (mostly on large computers) have a special set 
of "MAT" functions devoted to efficient matrix computation. While the 
8087 does not have matrix hardware, its stack design allows for easily 
written, efficient, matrix subroutines. 

We cover matrix operations in two chapters. In this chapter, we prepare 
routines for the most common matrix operations. Chapter 11 concentrates 
on advanced methods for solving systems of linear equations and on the 
related problem of matrix inversion. 



The Cookbook—Chapter 10 

Program: 

COLCOPY 

Purpose: 

Copy one column of a matrix into a vector. 

Call: 

CALL COLCOPY(A(0,0),B(0),COL,N,M). 

Input: 

A—N by M single precision matrix. 

COL—integer column number to be copied. 

N—integer number of rows of A. 

M—integer number of columns of A. 

Output: 

B—array N long; B(I) = A(I,COL). 

Language: 

8088 assembly language. 

Program: 

ROWCOPY 

Purpose: 

Copy one row of a matrix into a vector. 

Call: 

CALL ROWCOPY(A(0,0),B(0),ROW,N,M). 

Input: 

A—N by M single precision matrix. 

ROW—integer row number to be copied. 

N—integer number of rows of A. 

M—integer number of columns of A. 

Ill 
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Output: 

B—array M long; B(I) = A(ROW,I). 

Language: 

8088 assembly language. 

Program: 

DIAGCOPY 

Purpose: 

Copy the diagonal of a square matrix into a vector. 

Call: 

CALL DIAGCOPY(A(0,0),B(0),N). 

Input: 

A—N by N single precision matrix. 

N—integer number of rows of A. 

Output: 

B—array N long; B(I) = A(LI). 

Language: 

8088 assembly language. 

Program: 

TRANS 

Purpose: 

Transpose a matrix. 

Call: 

CALL TRANS(A(0,0),B(0,0),N,M). 

Input: 

A—N by M single precision matrix. 

N—integer number of rows of A. 

M—integer number of columns of A. 

Output: 

B—M by N matrix; B(LJ) = A(J,I). 

Language: 

8088 assembly language. 

Program: 

SQTRANS 

Purpose: 

Transpose a square matrix in place. 

Call: 

CALL SQTRANS(A(0,0),N). 

Input: 

A—N by N single precision matrix. 

N—integer number of rows of A. 

Output: 

A—new A(I,J) = old A0,I). 

Language: 

8088 assembly language. 

Program: 

INPROD 

Purpose: 

Inner product of two single precision vectors. 

Call: 

CALL INPROD(A(0),B(0),C,N). 

Input: 

A—N long single precision vector. 

B—N long single precision vector. 

N—integer number of rows of A. 

Output: 

C—double precision scalar; C=inner product of A,B- 

Language: 

8087/8088 assembly language. 

Program: 

GINP 

Purpose: 

Inner product of two generalized vectors. 

Input: 

A—N element vector. 

B—N element vector. 

TYPEA—integer giving length of element of A. 
TYPEB—integer giving length of element of B. 

SKIP A—integer "skip factor" (see text) for A. 

SKIPB—integer "skip factor" (see text) for B. 

N—integer number of rows of A. 

Output: 

8087 register ST; ST = inner product of A,B. 

Language: 

8087/8088 assembly language. 

Note: 

NEAR procedure; see GINPROD. 
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Program: 

Purpose: 

Call: 

Input: 


Output: 

Language: 

Note: 

Program: 

Purpose: 

Call: 

Input: 


Output: 

Language: 

Note: 

Program: 

Purpose: 

Input: 

Output: 

Language: 

Program: 

Purpose: 

Input: 


Output: 

Language: 


GINPROD 

Inner product of two generalized vectors. 

CALL GINPROD(A(0),B(0),C,TYPEA,TYPEB, 
SKIPA,SKIPB,N). 

A—N element vector. 

B—N element vector. 

TYPE A—integer giving length of element of A. 
TYPEB—integer giving length of element of B. 

SKIP A—integer "skip factor" (see text) for A. 
SKIPB—integer "skip factor" (see text) for B. 

N—integer number of rows of A. 

C—double precision scalar; C=inner product of A,B- 
8087/8088 assembly language. 

Requires NEAR procedure GINP. 

MATMULT 
Matrix multiplication. 

CALLMATMULT(A(0,0),B(0,0),C(0,0),L,M,N). 

A—L by M single precision matrix. 

B—M by N single precision matrix. 

L—integer number of rows of A. 

M—integer number of columns of A, rows of B. 

N—integer number of columns of B. 

C—L by N single precision matrix; C = AB. 
8087/8088 assembly language. 

Requires NEAR procedure GINP. 

GAUSS 

Solve linear equations by Gaussian elimination. 

A—N by N coefficient matrix. 

Y—N vector. 

N—number of rows and columns of A. 

X—N vector; X solves equations Y = AX. 

BASIC. 

GAUSS-SE 

Solve linear equations by Gaussian elimination, us¬ 
ing space efficient method. 

A—N by N coefficient matrix. 

Y—N vector. 

YSTAR—N vector, scratch space. 

N—number of rows and columns of A. 

X—N vector; X solves equations Y = AX. 

A—A replaced with Gaussian reduction. 

BASIC. 
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What is a Matrix? 

In computer terms, a matrix is a two-dimensional array. The values in 
the array can be thought of as being laid out in a rectangular grid, where 
the first array index is the row number and the second array index is the 
column number. An example of a "2 by 3” matrix is 

5 -7 11 

0 2 18 

Such a matrix might be stored in BASIC by DIMensioning an array 
with two rows and three columns. The BASIC statement "DIM A(l,2)" 
produces a matrix laid out like 

A(0,0) A(0,1) A(0,2) 

A(1,0) A(l,l) A(2,l) 

Since BASIC arrays are numbered starting at zero, an N row by M column 
matrix is dimensioned A(N-1,M-1). 


Why Are Matrices Interesting? 

Invariably, matrix algebra is motivated as notation for solving systems 
of simultaneous linear equations. This may seem a bit strange, as most 
of us don't have any great need for solving such systems. The truth is 
that most interesting numerical computation problems have the same 
mathematical structure as a system of linear equations. Computational 
aspects of statistics, differential equations, and constrained optimization 
all center around linear equations and matrix operations. We briefly lay 
out the linear equation interpretation of matrices here. 

As a sample, consider the following system of two linear equations in 
two unknowns. 

18 = 4xi + 2x2 
9 = 2xi - 2x2 

There is exactly one pair of values for Xi and X 2 that will make both 
equations true. To find these values, we draw the two equations on a 
piece of graph paper. Label the horizontal axis Xi and the vertical axis 
X 2 . Pick any two values for Xj. Plug each into the top equation and solve 
for the corresponding value of X 2 . Connect the two (xi,X 2 ) points to get 
a straight line. Do the same for the bottom equation. The top equation 
is true for any (xi,X 2 ) point on the first line and the bottom equation is 
true for any point on the second line. Where the two lines intersect, both 
equations are true. The point (4.5,0) is the solution to this system of 
equations. 

Matrices provide a compact notation for discussing such systems. In 
matrix notation, the two equations appear as: 
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y = Ax 

where y is a 2 by 1 matrix. 


y = 


18 

9 


A is a 2 by 2 matrix 


A = 


4 

2 


2 

-2 


and X is a 2 by 1 matrix. 


X = 


Xl 

X2 


If we are given values for x, we can solve for y by matrix multiplication. 
If we are given values for y, we can solve for x by solving a system of linear 
equations. 


Storage Allocation and Memory Access 

In order to manipulate matrices, we need to know how BASIC stores a 
matrix in memory. If A is a 2 by 3 matrix, we can think of it as being 
logically laid out as shown in Figure 10.1. Since computer memory is 
one-dimensional, BASIC arranges to store the six elements in consecutive 
order with the first dimension varying most rapidly as we move up in 
memory. The two-dimensional matrix is placed in memory in this order: 
A(0,0), A(1,0), A(0,1), A(l,l), A(0,2), A(l,2). Each element occupies four 
bytes for a single precision array and eight bytes for double precision. 

Another way to say the same thing is that BASIC stores each column 
in order, placing one column after the next in memory. Suppose the 
(single precision) matrix A is stored in memory with A(0,0) located at 
memory address 100. The first column of A will be at locations 100 and 
104; the second column at 108 and 112; the third at 116 and 120. The first 
row of A will be at locations 100, 108, and 116; the second at 104, 112, 
and 120. Figure 10.1 illustrates the two-dimensional array to one-dimen¬ 
sional-memory mapping. 

A(0,0)-100 A(0,l)-108 A(0,2)-116 

A(l,0)-104 A(l,l)-112 A(2,l)-120 

Figure 10.1 

In general, for annby m matrix, element (i,j) is stored in position (i + n*i)*k, 
where k equals four for single and eight for double precision. 
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Notice that a 1 by n matrix, called a row vector, and an n by 1 matrix, 
called a column vector, will both be stored in the same locations as an n 
element one-dimensional array. 

It is often convenient to think of a matrix as a set of column vectors 
or a set of row vectors. The routines COLCOPY and ROWCOPY illustrate 
column and row access. COLCOPY(A,B,COL,N,M) copies column COL 
of n by m matrix A into the N long array B. Analogously, ROWCOPY 
copies row ROW of matrix A into an M long array B. The BASIC code 
below illustrates COLCOPY. (Note that here, as elsewhere, we have 
written "DIM A(N —1,M —1)" for clarity, where BASIC actually requires 
"N1 = N- 1:M1 = M- 1:DIM A(N1,M1)".) 

10 DEFINT I-N 

IS REM DEFINE NiN HERE 

20 Din A(N-l-.n-l)-.B(N-l) 

25 REn FILL IN VALUES OF A 
30 FOR 1=0 TO N-1 
MO B(I)=A(I-,C0L) 

SO NEXT I 


^SUBROUTINE C0LC0PY(A-.B-.C0L->N-,n) 

i ASSUnPTIONS: A IS A SINGLE PRECISION N BY N NATRIX 


T 

B IS A 

SINGLE PRECISION 

ARRAY N LONG 

T 

C0L-.N-. 

M ARE INTEGERS 



PUBLIC 

COLCOPY 


CSEG 

SEGIIENT 

•CODE* 



ASSUME 

CS:CSEG 


COLCOPY 

PROC 

FAR 



PUSH 

BP 



nov 

BP-.SP 



MOV 

BXi[BP]+fi 

•.BX = ADDR(N) 


MOV 

CX-.[BX] 

nCX = N 


MOV 

BX-.[BP]+1D 

;BX = ADDR(C0L) 


MOV 

AX-.[BX] 

;ax=col 


MUL 

CX 

•iAX=N*C0L 


SHL 

AX-.1 

^AX=M*N*C0L 


SHL 

AX-,1 



MOV 

SI-,[BP]+m 

iSI = ADDR(A) 


ADD 

SI-.AX 

;SI=ADDR(A{D-,COL) 


MOV 

DI-.[BP]+ia 

nDI=ADDR(B) 


JCXZ 

DONE 


COL-LOOP: 

MOV 

AX-,[SI] 



MOV 

[DI]-,AX 



MOV 

AX-,[SI]+a 



MOV 

[DI]+a-,AX 



ADD 

SI-.M 

•iNEXT COLUMN 


ADD 

DIiM 



LOOP 

COL-LOOP 
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DONE: 

POP 

BP 


RET 

10 

COLCOPY 

ENDP 


CSEG 

ENDS 


END 




SUBROUTINE R0li)C0PY(A-,B-.R0U-.N-,(1) 

; ASSUMPTIONS: A IS A SINGLE PRECISION N BY M MATRIX 


T 

B IS 

A SINGLE PRECISION 

ARRAY N LONG 

T 

R0W-.N 

iM ARE INTEGERS 



PUBLIC 

ROWCOPY 


CSEG 

SEGMENT 

’CODE* 



ASSUME 

CS:CSEG 


ROUCOPY 

PROC 

FAR 



PUSH 

BP 



MOV 

BPtSP 



MOV 

BX-.[BP]+t 

:.BX = ADDR(M) 


MOV 

CX-,[BX] 

iCX = M 


MOV 

BX-,[BP]+1D 

iBX=ADDR(ROW) 


MOV 

AX-,[BX] 

•.AX = ROW 


SHL 

AXnl 

;ax=m*row 


SHL 

AX-,1 



MOV 

SI-,[BP]+m 

;SI = ADDR(A) 


ADD 

SIiAX 

iSI = ADDR(A(ROW-,D)) 


MOV 

DI-,[BPI+1E 

iDI = ADDR(B) 


MOV 

BX-,[BP]+fl 

;BX = ADDR(N) 


MOV 

BX-,[BX] 

^BX = N 


SHL 

BX-.1 

•.BX = M*N 


SHL 

r 

X 

m 



JCXZ 

DONE 


ROU_LOOP: 

MOV 

AX.[SI] 

^MOVE ELEMENT OF 


MOV 

[DIJ-iAX 



MOV 

AX-,[SI]+E 



MOV 

[DIl+E.AX 



ADD 

SI-.BX 

^NEXT ROW 


ADD 

DI.M 

^NEXT B 


LOOP 

ROW-LOOP 


DONE: 

POP 

BP 



RET 

IQ 


ROWCOPY 

ENDP 



CSEG 

ENDS 

END 




COLCOPY and ROWCOPY illustrate four useful points about moving 
through a matrix: 

• Column COL begins at location 4*N*COL. 

• Sequential elements in a column are located 4 bytes apart. 
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• Row ROW begins at location 4*ROW. 


• Sequential elements in a 

row are located 4*N bytes apart. 

Of course, " 

'8" would replace "4" for a double precision matrix. 

A matrix with an equal number of rows and columns is called, for 

obvious reasons, a square matrix. The elements A(0,0), A(l,l),. . ., 

A(N-1,N- 

1) form the principal diagonal of the matrix. To illustrate ac- 

cessing the principal diagonal, we present DIAGCOPY: 

^SUBROUTINE 1)IAGC0PY(A 

iBiN) 


^ ASSUMPTIONS: A IS A 

SINGLE PRECISION 

N BY N MATRIX 

T 

B IS A 

SINGLE PRECISION 

ARRAY N LONG 

T 

N IS AN 

INTEGER 



PUBLIC 

DIAGCOPY 


CSEG 

SEGMENT 

•CODE* 



ASSUME 

CS:CSEG 


DIAGCOPY 

PROC 

FAR 



PUSH 

BP 



MOV 

BP-.SP 



MOV 

BX-,[BP]+b 

nBX=ADDR(N) 


MOV 

CXn[BX] 

^CX=N 


MOV 

BX-iCX 

;bx=n 


INC 

BX 

tBX=N+1 


SHL 

BX-.1 

^BX=M*(N + 1) 


SHL 

BX.l 

iNOTE BX HAS DISTANCE 




BETWEEN DIAGONAL 




ELEMENTS 


MOV 

SIn[BP]+lD 

•nSI=ADDR(A) 


MOV 

DI-.[BP]+fl 

iDI=ADDR(B) 


JCXZ 

DONE 


DIAG-LOOP: 




MOV 

AXnlSI] 

nMOVE ONE ELEMENT 


MOV 

[DI1-.AX 



MOV 

AX-.[SI]+a 



MOV 

[DI1+E-.AX 



ADD 

SI-.BX 

iNEXT ELEMENT 


ADD 

DI-.M 

iNEXT B 


LOOP 

DIAG_L00P 


DONE: 

POP 

BP 



RET 

L 


DIAGCOPY 

ENDP 



CSEG 

ENDS 




END 



Moving across a diagonal is equivalent to moving down one column and 

over one row. Note the following two facts about accessing elements of 

the diagonal of a square matrix: 


• Diagonal element i is i 

at location i*4*(N +1). 

• Sequential diagonal elements are 4’^(N +1) bytes apart. 
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Basic Matrix Operations 

Matrix operations fall into six categories: 

1. Scalar operations. 

2. Element-by-element operations. 

3. Matrix transposition. 

4. Inner products and matrix multiplication. 

5. Solving systems of linear equations. 

6. Matrix inversion. 


Scaiar and Eiement-by-Eiement Operations 

Operations between scalars and matrices operate by applying the scalar 
operation to every element of the matrix. For example, if A is an n by 
m matrix, the mathematical operation B = A -I- 5 could be done with 
the BASIC program: 

10 DEFINT I-N 

SO Din A(N-i-.n-i)i B(N-i-.n-i) 


30 

FOR 1= □ 

TO 

N-1 


FOR J = 

□ 

TO n-i 

SU 

B(I-.J)= 

A(I 

nJ) + 5 

to 

NEXT J 



7D 

NEXT I 




This BASIC program could be replaced with the 8087 subroutine SCA- 
LADD. 

iSUBROUTINE SCALADD(A->SCALARiB-.N-.n) 

i ASSUMPTIONS: A-.B ARE SINGLE PRECISION N BY M MATRICES 
^ SCALAR IS SINGLE PRECISION 


T 

N-.M 

ARE INTEGERS 



PUBLIC 

SCALAOO 


CSEG 

SEGMENT 

*C00E’ 



ASSUME 

CS:CSEG 


SCALADD 

PROC 

FAR 



PUSH 

BP 



MOV 

BP.SP 



MOV 

BX-.[BP]+1S 

nBX = A00R(SCALAR) 


FLO 

OUORO PTR [SI] 

^PUSH SCALAR ONTO 




STACK 


MOV 

BX-,[BP]+fci 

iBX=A00R(M) 


MOV 

0X-.[BX] 

^0X= # OF COLUMNS 


MOV 

SI-.[BP]+m 

;SI = A00R(A) 


MOV 

FUAIT 

0In[BP] + lD 

i0I = A00R(B) 

COLUMN- 

LOOP: 




MOV 

BX-,[BP]+fl 

iBX=A00R(N) 


MOV 

CX-,[BX] 

;CX=C0LUMN LENGTH 


MOV 

BX-.Q 
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ADD-LOOP: 


FLD 

DWORD PTR [BX][SI] 

^LOAD Ad-iJ) 

FADD 

ST(Q)tST(1) 

UDD SCALAR 

FSTP 

DWORD PTR [BXPI] 

nB(Ii J)=SCALAR + A(I-. J) 

ADD 

BX■.^ 

•nREADY FOR NEXT 

LOOP 

ADD-LOOP 

ELEMENT 

MOVE TO NEXT 

COLUMN BY ADDING M*N 

TO SI AND DI 

nov 

BX-,[BP]+fl 

iBX = ADDR(N) 

MOV 

AX-.[BX] 

UX=C0LUMN LENGTH 

SHL 

AX-.1 

^.MULTIPLY AX 

SHL 

AX-il 

^BY 4 

ADD 

SI.AX 


ADD 

DI-.AX 


UE DONE YETf 
DEC 

DX 


CUP 

a 

r 

X 


JLE 

COLUMN-LOOP 


FSTP 

ST(D) 

;get rid of scalar 

POP 

BP 


FUAIT 

RET 

IQ 



SCALADD ENDP 

CSEC ENDS 

END 


Routine SCALADD takes about 53 microseconds per element. The time 
for the same routine in BASIC varies according to the number of rows 
and columns, but, for a 50 by 50 matrix, BASIC requires about 6400 
microseconds per element. 

SCALADD illustrates indexing down the columns and across the rows 
of a matrix. It would be straightforward to write routines for the other 
scalar operations as well as for element-by-element matrix addition, sub¬ 
traction, and so forth. However, a slight "trick" of observation suggests 
an even easier solution. Computer memory doesn't "know" that the n 
by m storage locations represent a matrix. The locations could equally 
well represent an n by m element one-dimensional array. All element-by¬ 
element and scalar matrix operations can be done by using vector routines, as 
developed in Chapter 9. 

For example, the following BASIC code, using ADDSC from Chapter 
9, works as well as SCALADD. 

10 DEFINT I-N 

ao Din A(N-i-,n-i)-. B(N-i-.n-i) 

30 SCALAR=S.D: ISIZE=N*n 

CALL ADDSC(A(0)-,SCALAR-iB(0)-.ISIZE) 
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Matrix Transposition 

The matrix operation transpose exchanges the rows and columns of a 
matrix. If A is an n by m matrix, then "A transpose" is an m by n matrix 
such that (A transpose)(i,j) = A(j,i). A transpose is often written A^ or 
A' (pronounced "A transpose" or "A prime"). A BASIC program to 
transpose a matrix is straightforward. For example: 

10 DEFINT I-N 

20 Din A(N-l-in-l)-. AT(n-liN-l) 

30 FOR 1=0 TO N-1 
^0 FOR J-0 TO n-1 

50 AT(J-iI)=A(I-.J) 

to NEXT J 

70 NEXT I 

The 8088 subroutine TRANS accomplishes the same task as the BASIC 
code above. We take advantage of the fact that we can move down the 
columns of A by counting off memory locations four at a time and move 
across the rows of A' by counting off memory locations 4*M bytes at a 
time. 

Notice that TRANS requires A and B to be different matrices. If A and 
B were to occupy the Same memory locations, the copying operations 
would write over some A locations before we were able to read them 
into B. Subroutine TRANS uses about 16 microseconds per element. 


^SUBROUTINE TRANS(A 

T B T 

N-.M) 


n ASSUMPTIONS: A IS 

A 

SINGLE PRECISION 

N BY M MATRIX 

T 

B IS 

A 

SINGLE PRECISION 

M BY N MATRIX 

1 

NnM 

ARE 

INTEGERS 



PUBLIC 


TRANS 


CSEG 

SEGMENT 


•CODE’ 



ASSUME 


CS:CSEG 


TRANS 

PROC 


FAR 



PUSH 


BP 



MOV 


BPnSP 



MOV 


BXi[BP]+fl 

;BX=ADDR(N) 


MOV 


CXi[BX] 

II 

X 

• r 


JCXZ 


DONE 



MOV 


BXi[BP]+b 

iBX=ADDR(M) 


MOV 


DX-.IBX] 

;dx=m 


CMP 


DXiD 



JLE 


DONE 



MOV 


SI-,[BP]+1B 

iSI=ADDR(A) 


MOV 


DI-.[BP]+1D 

iDI=ADDR(B) 

R0li)_L00P: 

MOV 


BXn[BP] + fi 

nBX=ADDR(N) 


MOV 


CXn[BX] 

iCX=N (COL LENGTH) 


MOV 


BXnO 

CD 

X 

II 

o 

C0L_L00P: 

MOV 


AXi[SI] 

nMOVE M BYTES 
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nov 

[DIPXJ.AX 



nov 

AX-,[SI]+2 



nov 

[DI][BX]+a.AX 



ADD 

SI-,M 

iNEXT A ELEnENT 


nov 

AX-.BX 

iSAVE B ROW POSITION 


nov 

BX-.[BP]+b 

^BX=ADDR(n) 


nov 

BX-.[BX] 

nBX = n 


SHL 

BX.l 

nBX=M*n 


SHL 

BX-.]. 



ADD 

BXiAX 

^NEXT B ELEnENT 


LOOP 

COL-LOOP 

iIF NOT DONE 


ADD 

DI-.M 

tNext row of B 


DEC 

DX 

lONE ROU DONE 


JG 

ROlil-LOOP 


DONE: 

POP 

BP 



RET 

a 


TRANS 

ENDP 



CSEG 

ENDS 




END 




Transposition of a square matrix leads to an important special case. To 
conserve space, we frequently transpose a square matrix "in place," as 
in the following BASIC code. Notice that the second FOR loop only runs 
from the diagonal element to the end of the row. The "lower triangle" 
of the square gets swapped with the upper triangle. 

10 DEFINT I-N 
20 DIM A(N-1-,N-1) 

30 FOR 1=0 TO N-1 
MO FOR J=I TO N-1 

50 SNAP A(I-.J) 1 A(J-.I) 

bO NEXT J 

70 NEXT I 

We can think of this code as moving along the diagonal of a matrix 
and swapping the row from the diagonal point to the right with the 
column from the diagonal down. Subroutine SQTRANS performs this 
task. The BASIC code above takes about 2800n2 microseconds to trans¬ 
pose A in place. SQTRANS requires only 8n^ microseconds. 

iSUBROUTINE S(JTRANS(A-,N) 

; ASSUHPTIONSs a is a single precision n by n matrix 


1 

N IS 

AN INTEGER 



PUBLIC 

S(3TRANS 


CSEG 

SEGnENT 

•CODE’ 



ASSUnE 

CSiCSEG 


S(2TRANS 

PROC 

FAR 



PUSH 

BP 



nov 

BP-.SP 



nov 

BX-.[BP]+fc, 

•.BX = ADDR(N) 


nov 

DX-.IBX] 

iDX=N 


nov 

CD 

X 

X 

iBX = N 
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SHL 

BX-.1 

^BX=^*N 


SHL 

p 

X 

m 



nov 

BP-.[BP]+fl 

;IJE’RE SHORT OF REGISTERS 
nSO UE'LL USE BP TO POINT 
iTO DIAGONAL ELEMENT 
nBP=ADDR(A) 


CMP 

DXiD 



JLE 

DONE 


DIAG-LOOP: 

MOV 

X 

r 

X 

:;DX #0F ELEMENTS LEFT 


MOV 

SIiBP 

nSI POINTS TO ROW 


MOV 

DI-.BP 

\J>1 POINTS TO COLUMN 

R0U_L00P: 

MOV 

AX-.[SI] 

nSUAP ROW AND COLUMN 
nMOVE M BYTES 


XCHG 

[DI]-,AX 



MOV 

[SI1-.AX 



MOV 

AX-.[SI]+S 



XCHG 

[DI]+a-.AX 



nov 

[SI1+E-.AX 



ADD 

SI-.BX 

^NEXT ROW ELEMENT 


ADD 

DI-.M 

nNEXT COLUMN ELEMENT 


LOOP 

ROU-LOOP 

nIF NOT DONE 


ADD 

BPiBX 

iNEXT DIAGONAL ELEMENT 


ADD 

r 

CL 

CQ 



DEC 

DX 

•.NEXT COLUMN IS SHORTER 


cnp 

DXnD 



JG 

DIAG-LOOP 


DONE: 

POP 

BP 



RET 



S(3TRANS 

ENDP 



CSEG 

ENDS 

END 




Inner Products and Matrix Multiplication 

More scientific computation time is spent computing inner products than 
on any other single problem. Inner products are at the heart of both 
matrix multiplication and matrix inversion. If x and y are vectors, then 
to find the inner product of x and y one multiplies the two vectors element 
by element and sums the products, as in the following BASIC program. 

ID DEFINT I-N 
ED DEFDBL S 
3D DIM X(N-l)iY(N-l) 

MD sun=D 

SD FOR I=D TO N-1 

bD sun=sun+x(i)*Y(i) 

7D NEXT I 
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At first glance, the inner product doesn't appear to be a particularly 
interesting operation. However, consider the specification of our system 
of linear equations earlier in the chapter. The first equation was 

yi = Ao,oXi + Ao,iX 2 


Thus Yi equals the inner product of the first row of A with the vector x. 
Similarly, the second equation specifies that y 2 equals the inner product 
of the second row of A with x. In this manner, an entire system of 
equations can be specified in terms of inner products. This leads to a 
natural definition of matrix multiplication in terms of inner products. 

If C=AB, then Q,- equals the inner product of row i of A with column j of B. 

Note that this definition implicitly assumes that A and B are conformable 
for multiplication, that is, the number of columns of A equal the number 
of rows of B. A further natural result of the definition is that if A is a 1 
by m matrix and B is an m by n matrix, then C will be 1 by m. 

BASIC code to multiply two matrices is: 

ID DEFINT I-N 
SO DEFDBL S 

55 REM REMEUBER TO DEFINE LiNiN AND USE L1=L-1-.ETC IN LINE 
30 

30 Din A(L-l-,n-l)-iB(n-l-,N-l)-.C(L-l-,N-l) 

35 REn DEFINE NATRICES A AND B HERE 
HO FOR IR0U=0 TO L-1 

50 FOR JC0L=0 TO N-1 

to SUM=0 

70 FOR K=0 TO M-1 

ao SUn = SUn+A(IR0li)-.K)*B(K-, JCOL) 

“iO NEXT K 

100 C(IR0U-.JC0L)=SUn 

110 NEXT JCOL 

150 NEXT IROlil 

Lines 70, 80, and 90 are executed Tm^n times. For matrices of order 
50, that's 125,000 additions and multiplications. You can see why we 
want these lines to be as efficient as possible! 

Notice that we collected the inner product in a temporary variable 
"SUM," rather than directly in "C(IROW,JCOL)." We did this for two 
reasons. First, it is somewhat more efficient, since BASIC need calculate 
the location of C(IROW,JCOL) only l*n times, rather than l*m*n times. 
Second, and far more important, accuracy is improved greatly by accu¬ 
mulating the sum in double precision even if it is to be stored later as a 
single precision variable. 

Because of the central role of inner products and matrix multiplications 
in numerical computation, accuracy and speed are vital. We present 
several 8087 routines written with these objectives in mind. Our first 
routine forms the inner product of two one-dimensional arrays. 
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^SUBROUTINE INPROD(A-.B-,C-.N) 

n ASSUMPTIONS: A-.B ARE SINGLE PRECISION N ARRAYS 


i C IS 

N IS 
PUBLIC 

CSEG SEGMENT 

ASSUME 

INPROD PROC 

PUSH 
MOV 
MOV 
MOV 
MOV 
MOV 
MOV 
FLDZ 
JCXZ 

ADD-LOOP: 

FLD 

FMUL 

FAOOP 

ADD 

LOOP 

DONE: 

MOV 

FSTP 

POP 

FlilAIT 

RET 

INPROD ENDP 

CSEG ENDS 

END 


A DOUBLE PRECISION 
AN INTEGER 
INPROD 
*CODE’ 

CS:CSEG 

FAR 

BP 

BP-iSP 

BX-.[BP]+b 

CX-,[BX] 

SI-,[BP]+ia 

DI-,[BP]+1Q 

BX.D 

DONE 

DWORD PTR [BX][SI] 
DWORD PTR [BX][DI1 
STdJiST 
BX.M 

ADD-LOOP 

BX-.(BP]+a 
13W0RD PTR [BX] 

BP 

a 


SCALAR 


iBX = ADDR(N) 
iCX=N 

iSI = ADDR(A) 

;DI = ADDR(B) 

nSET RUNNING SUM=0 


aOAD A(I) 
^MULTIPLY BY B(I) 
;SUM = SUM + A(I)*B(I) 
^READY FOR NEXT 
ELEMENT 


•.BX = ADDR(C) 
nC=INNER PRODUCT 


Routine INPROD takes about 59 microseconds per array element. 

You might expect our next step would be an 8087 routine to multiply 
two matrices. Instead of proceeding directly to a matrix multiplication 
program, we are going to take a short strategic detour. A matrix multi¬ 
plication subroutine presents two difficulties. First, writing such a routine 
is complicated by the need to keep track of too many indices. As you 
can see from the BASIC program above, the program needs to remember 
IROW, JCOL, K, L, M, N and the locations of A, B, and C. Using a direct 
approach, we would run out of registers rather quickly. Second, a 
straightforward matrix multiplication routine could be used only on one 
specific argument type; for example, multiplying two single precision 
matrices and returning a single precision result. 

Our strategic approach is to write a very general inner product routine 
upon which we can build more complicated programs. Subroutine GINP, 
below, calculates the inner product of two n-element arrays. The result 
is left on the top of the 8087 stack. In addition to specification of the 
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input vectors, GINP accepts two kinds of options. The first option allows 
us to specify either single or double precision input arrays. The second 
option allows us to tell GINP how far apart in memory the elements of 
each array are spaced. Thus if array A has a "skip” parameter of one, 
the elements are stored sequentially. If the skip parameter is two, then 
elements are stored in every other location—with four bytes between 
elements for a single precision array and eight bytes between elements 
for a double precision array. 

Of what use is the "skip" parameter? Think about accessing a row of 
a matrix. The elements of an m by n matrix are located 4*n bytes apart. 
Thus we can move across the row of an m by n matrix by specifying n 
as the skip parameter. 


iSUBROUTINE GINP(A-.B-.TYPEA-iTYPEB-.SKIPA-.SKIPS-.N) 
i ASSUMPTIONS: A-.B ARE ADDRESSES OF N-ARRAYS IN DATA SEGMENT 
; TYPEA-.TYPEB-.SKIPA-.SKIPB-.N ARE INTEGERS 

i NOTE THIS PROCEDURE CANNOT BE CALLED FROM 

BASIC 

^ IT FINDS ITS ARGUMENTS ON THE STACK 

NOT THEIR ADDRESSES 

n THERE MUST BE AT LEAST B FREE LOCATIONS ON 

^ THE aoa? STACK AND AT LEAST m FREE BYTES ON 

i THE MEMORY STACK 


GINP RETURNS THE INNER PRODUCT OF A AND B ON 
THE aoa? STACK 

GINP TAKES EVERY SKIPA ELEMENT OF A AND 
EVERY SKIPB ELEMENT OF B 


GINP 


ASSUME 

CSiCSEG 


PROC 

NEAR 


PUSH 

BP 


MOV 

CL 

p 

Q. 

m 


S IS A 

NEAR PROCEDURE-. ARGUMENTS BEGIN AT [BPl-m 

PUSH 

AX 


PUSH 

BX 


PUSH 

CX 


PUSH 

DX 


PUSH 

SI 


PUSH 

DI 


FLDZ 


.SET RUNNING SUM=D 

MOV 

CX-,[BP]+H 

nCX = N 

MOV 

SI-.[BP]+lt, 

tSI=ADDR(A) 

MOV 

DI-.[BP]+m 

^DI=AI>DR(B) 

MOV 

AX-,[BP]+10 

UX = TYPEB 

MUL 

WORD PTR [BP]+b 

nAX=TYPEB*SK:iPB 

MOV 

BX-.AX 

iBX=B ELEMENT DISTANCE 

MOV 

AX-,[BP)+12 

•.AX=TYPEA 
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MUL 

WORD PTR [BP]+fl UX=A 

ELEMENT DISTANCE 


JCXZ 

DONE 



GIN-PLOOP: 

CMP 

WORD PTR [BP]+15iM ilS A 

SINGLED 



JNE 

A_D0UBLE 




FLD 

DWORD PTR [SI] ;L0AD 

SINGLE A(I) 



JMP 

nULT_B 



A_D0UBLE: 

FLD 

flUORD PTR [SI] nLOAD 

DOUBLE A(I) 


MULT-B: 

CUP 

WORD PTR [BP]+l□■.^ ;IS B 

SINGLED 



JNE 

B_D0UBLE 




FMUL 

DWORD PTR [DI] nflULTIPLY SINGLE 

B(I) 


JMP 

NEXT-ELEMENT 



B-DOUBLE: 

FMUL 

(3W0RD PTR [DI] iNULTIPLY DOUBLE 

M 

CD 


NEXT-ELEMENT: 

FADDP 
ADD 
ADD 
LOOP 

DONE: 

POP DI 

POP SI 

POP DX 

POP CX 

POP BX 

POP AX 

POP BP 

RET m 

GINP ENDP 


Subroutine GINP is written as a NEAR procedure. This means it cannot 
be called directly from BASIC. However, it also means that GINP is 
automatically relocatable. Below, we write a FAR procedure, GINPROD, 
to call GINP from, BASIC. Because an 8088 NEAR call jumps to a location 
relative to the current value in the instruction pointer, GINPROD and 
GINP can be moved together without changing the CALL instruction in 
GINPROD. 

GINP should be assembled together with GINPROD and any other 
routines which call GINP. This helps insure that our dynamic relocation 
scheme will function properly. For this same reason, we have omitted 
the PUBLIC and SEGMENT/ENDS statements, as we will with all NEAR 
procedures. In fact, the most convenient way to use our matrix routines 
is to combine them all into one assembly language package. Combining 
the routines makes it easy for them to share the same copy of GINP and 
the scratch space we define in GINPROD. (WeTl assume that you com¬ 
bine the routines this way and won't set up separate scratch space areas 
for each.) 


ST(1)-.ST iSUn=SUn+A(I)*B{I) 

SI.AX 

DI-.BX 

GINP-LOOP 
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Since GINP won't be called from BASIC, we have used slightly different 
parameter passing conventions for convenience. The addresses of the 
two arrays, A and B, are pushed onto the stack, then the values of the 
"types" (four for single precision or eight for double precision), rather 
than the addresses of the "types," of A and B, the skip parameters for 
A and B, and the value of N, are pushed onto the stack. Since GINP is 
a NEAR procedure, the parameters begin in the stack at [BP] + 4 rather 
than [BP] + 6. GINP saves registers on the 8088 stack, and expects that 
any routine calling it will leave free at least seven words on the stack. 
The calling routine should set up its own stack area rather than rely on 
the area provided by BASIC. 

Procedure GINP uses about 125 microseconds for overhead (finding 
addresses and so forth) plus 59 microseconds for each array element. 

Routine GINPROD makes GINP accessible from BASIC. GINPROD 
returns, in C, the double precision value of the inner product. 

^SUBROUTINE GINPR0D(A-.B-.C-.TYPEA-,TYPED-.SKIPA-iSKIPB-.N) 
i ASSUMPTIONS: A-,B ARE N-ARRAYS 

C IS A DOUBLE PRECISION SCALAR 
i TYPEA-.TYPEB-.SKIPA1SKIPB-.N ARE INTEGERS 

; THIS SUBROUTINE CALLS THE INTERNAL SUBROUTINE GINP 



PUBLIC 

GINPROD 

CSEG 

SEGMENT 

’CODE’ 


ASSUME 

CS:CSEGiES:ESEG 

FIRST-INST E(3U THIS WORD 

GINPROD 

PROC 

FAR 


PUSH 

BP 


MOV 

BP-.SP 

iSET UP 

STACK AREA 

IN ESEG 


PUSH 

ES 


CALL 

NEXT 

NEXT: 

POP 

AX 


SUB 

AXiCOFFSET NEXT)-(0FFSET FIRST_INST) 


MOV 

CL-.M 


SHR 

AXiCL 


MOV 

BX-.CS 


ADD 

BX-.ESEG 


SUB 

BX-.CSEG 


ADD 

AX-.BX 


MOV 

ES-.AX 


MOV 

LOCAL-SPACE-iSS 


MOV 

LOCAL-SPACE+a.SP 


MOV 

ax-.es 


MOV 

SS-.AX 


MOV 

SPnOFFSET STACK-TOP 






10 n Basic Matrix Operations 129 

nSET UP 

CALL PARAMETERS 


■^NOTICE 

THAT UE HAVE 

CHANGED THE SS REGISTER 

nSOn lllE 

HAVE TO TAKE 

ADVANTAGE OF THE 

FACT THAT BASIC SETS 

nSS AND 

DS TO THE SAME LOCATION 



PUSH 

DS:[BP)+B0 

nADDR(A] 


PUSH 

DS:[BPl+lfl 

UDDR(B) 


nov 

BX-,DS:[BP]+m 

;typea 


PUSH 

[BX] 



nov 

BX-.DS:[BP]+ia 

•.TYPEB 


PUSH 

[BX] 



nov 

BX-,DS:[BP]+10 

iSKIPA 


PUSH 

[BX] 



nov 

BX-,DS:[BP]+fl 

^SKIPB 


PUSH 

[BX] 



nov 

BX-,DS:[BP]+b 

iN 


PUSH 

[BX] 



CALL 

GINP 



nov 

SP.LOCAL-SPACE+a 



nov 

SSnLOCAL-SPACE 



nov 

BX*.[BP]+ll3 

nBX=ADDR(C) 


FSTP 

(3U0RD PTR [BX] 

tSTORE C 


POP 

ES 



POP 

FUAIT 

BP 



RET 

IL 


GINPROD 

ENDP 



CSEG 

ENDS 



ESEG 

SEGMENT 

•DATA* 



DU 

SO DUP (f) 


STACK-TOP E(3U 

THIS UORD 


LOCAL-SPACE DU 

SO DUP m 


ESEG 

ENDS 



One programming "trick" bears special attention here. The stack area 
provided by BASIC when GINPROD is called may have only eight words 
on it. Since this isn't enough, GINPROD sets up its own stack segment 

in the ESEG area. GINPROD changes the stack segment register, SS, to 

point to this area. Once SS has been changed, we need to use some other 
segment register when retrieving arguments from BASIC. In GINPROD, 
we use the DS register since BASIC sets SS and DS to the same value. 
This works quite well when GINPROD is called from BASIC, but some 

other method might be 

necessary if GINPROD is used with another 

language. 




GINPROD leads immediately to a fast BASIC routine for matrix mul- 

tiplication. 




10 DEFINT I-N 

SO DEFDBL S 



30 Din 

A(L-lin-l)nB(n- 
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3S I0NE=1 : ITYPE=M 

MO FOR IR0li)=0 TO L-1 

SO FOR JC0L=0 TO N-1 

55 CALL GINPR0D(A(IR0lil-ia)-iB(0-iJCOD-iSUnilTYPE-iITYPEilliIONE-ill) 
5b REM FIND INNER PRODUCT OF ROli) IROli) OF A WITH 

57 REM COLUMN dCOL OF B RETURNING THE ANSWER IN SUM 

58 REM NOTE ITYPE=M INDICATES SINGLE PRECISION 

bO REM SUM=0 

70 REM FOR K=0 TO M-1 

80 REM SUM=SUM+A(IROW-,K)*B(KiJCOL) 

"10 REM NEXT K 

IDO C(IROU-.JCOL)=SUM 

110 NEXT JCOL 

120 NEXT IROW 

For convenient comparison, we have adapted the earlier BASIC pro¬ 
gram for matrix multiplication by adding statements 35 and 55-58 and 
changing 70, 80, and 90 into REMARKS. This program directly takes the 
inner product of each row of A with each column of B. 

How much time do we save by multiplying matrices using GINPROD 
instead of straight BASIC code? For large m, both programs are roughly 
proportional to Pm*n. The constant of proportionality is about 9600 mi¬ 
croseconds for BASIC. Using GINPROD, the constant of proportionality 
falls to 61 microseconds. Thus, multiplying two 50 by 50 matrices takes 
about 20 minutes in BASIC without the 8087. Using the 8087, the program 
takes about eight seconds. 

Suppose the middle index, m, is small compared to 1 and n. Lines 70, 
80, and 90 use time proportional to Pm*n. Lines 40-60 and 100-120 execute 
in time proportional to Pn. Ordinarily in timing analysis, if a cubic term, 
such as Pm*n, is present, we drop quadratic terms, such as Pn. If m is 
small, the quadratic terms become important. For example, if m = 1, the 
program spends as much time in lines 40-60 and 100-120 as in 70-90; use 
of the routine GINPROD doesn't speed up anything at all. 

Speed considerations thus suggest a pure 8087 routine for matrix mul¬ 
tiplication. Routine MATMULT essentially imitates the BASIC code above. 

iSUBROUTINE MATNULT(A-.B-iC-,LiM-,N) 

i ASSUMPTIONS: AiB-iC ARE SINGLE PRECISION MATRICES 

i A IS L BY M 

B IS M BY N 

C IS L BY N 

; LiMiN ARE INTEGERS 

i THIS SUBROUTINE PERFORMS THE MATRIX MULTIPLICATION C=AB 
n SUCCESSIVE ROUS OF A ARE MULTIPLIED BY THE FIRST COLUMN 
OF B 

^ THEN REPEAT FOR SECOND COLUMN-, ETC- 
PUBLIC MATMULT 

CSEG SEGMENT ’CODE* 

ASSUME CS:CSEG-,ES:ESEG 
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FIRST_INST 

zau 

THIS WORD 


flATflULT 

PROC 

FAR 



PUSH 

BP 



MOV 

BP-.SP 


iSET UP STACK AREA IN ESEG 



PUSH 

ES 



CALL 

NEXT 


NEXT: 

POP 

AX 



SUB 

AX-,(OFFSET NEXT)- 

(OFFSET FIRST-INST) 


MOV 

CLiM 



SHR 

AX-iCL 



MOV 

BX-iCS 



ADO 

BX.ESEG 



SUB 

BXnCSEG 



ADD 

AXiBX 



MOV 

ES-.AX 


T 

MOV 

LOCAL-SPACEnSS 



MOV 

L0CAL_SPACE+5nSP 



MOV 

AXiES 



MOV 

SS.AX 



MOV 

SP.OFFSET STACK- 

TOP 

1 

^ TO CALL 

GINP 

lilE MUST PUSH ONTO THE 

STACK: 

i A(I*.D) 




n BCOnJ) 




i M 








^ L 




1 




1 




n ON RETURN THE 

RESULT GOES IN C(I-,J) 


i USE SOME 

LOCAL STORAGE TO SAVE ADDRESSES OF 

n 

A(I-iO) B(O-iJ) C(IiJ) 


SOME-SPACE 

E(2U 

LOCAL_SPACE+^ 


AODRA-HOLI) 

zau 

SOME-SPACE 


AODRB-HOLD 

zau 

ADDRA-HOLD+2 


L_HOLD 

zau 

ADDRB-HOLD+E 


MM-HOLO 

zau 

L-HOLD+B 


M_HOLD 

zau 

MM-HOLD+E 


N_HOLD 

zau 

M-HOLD+E 



MOV 

BX-,DS:[BP]+lb 

:.BX = ADDR(A(D-.0)) 


MOV 

ADDRA-HOLDiBX 



MOV 

SI-,DS:[BP]+m 

^SI = ADDR(B(D-,D)) 


MOV 

ADDRB-HOLDnSI 



MOV 

DI-.DS:[BP]+1E 

nDI=ADDR(C(a-.D)) 


MOV 

BX-.DS:[BP]+10 

:.BX = ADDR(L) 


MOV 

AX-.IBX] 



MOV 

L-HOLD-.AX 

iL-HOLD HAS L 


MOV 

BX-.DS:[BPl+a 

;BX=ADDR(M) 


MOV 

DX-.[BX] 

iDX=M 
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nov 

n_H0LD-.I>X 




MOV 

HM-HOLD-iDX 




SHL 

im_H0LD-il 

\GZ1 M*n 



SHL 

im_H0LD-tl 




nov 

BX-.DS:[BPI+b 

nBX=ADDR(N) 



nov 

CX-.[BX] 

nCX = N 



nov 

N-HOLD-iCX 




nov 

AX-.4 

•.SAVE USEFUL 



nov 

BX-.1 

•.SAVE USEFUL 

1 

1 

COL-LOOP: 

cnp 

N_H0LDiD 

^COL DONEf 



JE 

DONE 




nov 

SIiADDRA-HOLD 




nov 

CX-iL-HOLD 



ROlil-LOOP: 

PUSH 

SI 

ud.Q) 



PUSH 

ADDRB-HOLD 

. B(D. J) 



PUSH 

AX 




PUSH 

AX 




PUSH 

L_H0L1) 

;l 



PUSH 

BX 




PUSH 

DX 

^n 



CALL 

GINP 




FSTP 

DWORD PTR [DI] 





ADO 

01.M -.NEXT 

c 


ADD 

SI.M =iNEXT 

A 


LOOP 

ROU-LOOP ^NEXT 

ROlil 


nov 

SI.nM-HOLO ^SKIP 

TO NEXT COLUnN 


AOO 

AOORB-HOLO.SI ^NEXT 

B 


DEC 

N_H0L0 



jnp 

COL-LOOP 


DONE: 

nov 

SP.LOCAL-SPACE+E 



nov 

SS.LOCAL-SPACE 



POP 

ES 



POP 

FUAIT 

BP 



RET 

la 


nATnULT 

ENOP 



CSEG 

ENOS 

ENO 




MATMULT executes in about 211*rn + 59*rm*n microseconds. In the 
worst case, m = 1 and large l*n, MATMULT uses about 270 microseconds 
per element. Even though 80 percent of the 270 microseconds is overhead, 
MATMULT is still over 100 times faster than BASIC. By the time m is as 
large as 20, execution speed rises to about 70 microseconds per element, 
which is 80 percent of maximum hardware speed. Adaptation of MAT¬ 
MULT to double precision arguments is straightforward. 
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GINPROD allows us to easily create many variants of matrix multi¬ 
plication. Suppose we want to multiply the transpose of a matrix A by 
a matrix B, as in C = A'B, where A is m by 1 and B is m by n. Row i of 
A' is column i of A, so we can use GINPROD specifying a "skip" of 1 
for A to specify a row of A'. 

ID DEFINT I-N 
20 DEFDBL S 

3D Din A{n-l-,L-l)-,B(n-liN-l)-,C(L-l-.N-l) 

35 11=1 : ITYPE=^ 

MD FOR IR0li)=0 TO L-1 
SO FOR JC0L=0 TO N-1 

55 CALL GINPROD(A(D-,IROlJ)-,B(D-.JCOL)-,SUn-,ITYPE-.ITYPE-.IliIl,M) 
bO REM SUM=0 

70 REM FOR K=0 TO M-1 

aO REM SUM=SUn+A(K->IROIi))*B(K-, JCOL) 

ID REM NEXT K 

100 CdRObl, JC0L)=SUn 

110 NEXT JCOL 

120 NEXT IRON 

A slightly simpler program could be written using INPROD rather than 
GINPROD, but the method here allows double precision matrices and 
is easily adaptable to problems such as C = AB', which require the matrix 
to be processed by row rather than column. 

Solving Systems of Linear Equations 

This is a good place to pause in your reading. We spend the rest of this 
chapter on linear algebra and in writing BASIC programs for solving 
systems of linear equations and inverting matrices. Our next 8087 pro¬ 
gram doesn't appear until Chapter 11. If your main interest is the 8087 
aspect of these problems, you should just quickly skim the rest of this 
chapter. 

The next few pages move very fast. You can spend most of a course 
in college learning about linear equations. The next few pages are really 
more of a quick review than a proper introduction to the subject. If you're 
new to the topic—or if it's been a long time since you last saw the 
subject—spend some time playing with the BASIC programs. One of the 
nice things about exploring with a personal computer is that your "study" 
can be as fast or as slow as you please. 

Equation Manipulation 

Return now to our example of two linear equations in two unknowns. 
The equations to be satisfied are: 

18 = 4xi + 2x2 
9 = 2xi - 2x2 
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which can also be written 
y = Ax 

For a given y, what value of x makes both equations true simulta¬ 
neously? We solve for x by making judicious use of the following theo¬ 
rems. 

1. If we multiply both sides of a true equation by a constant, the resulting 
equation is also true. 

2. If we add one true equation to another, the resulting equation is also true. 

3. We can always exchange the position of two equations. 

Clever application of these principles allows us to easily solve systems 
of linear equations. Consider applying the following transformations to 
our example system. 

1. Multiply the top equation by - Vz and add the result to the second 
equation. The transformed system looks like this: 

18 = 4xi + 2x2 
0 = Ox: - 3x2 

2. By inspecting the bottom equation, we see that X 2 equals 0. Solving 
backwards, we set X 2 to zero in the top equation and see immediately 
that Xi equals Wi, or 4.5. 


Matrix Manipulation 

These steps generalize to a two-step procedure for solving systems of 
linear equations in terms of matrices. 

1. Reduce the system to triangular form. Multiply the first equation by 
a constant and add the result to the second equation so as to produce 
a zero in column 1, row 2. Multiply the first equation by a (different) 
constant and add the result to the third equation so as to produce 
a zero in column 1, row 3. Continue in this manner until the first 
column is all zeros below the diagonal. 

Now take the second equation, multiply it by a constant and add 
it to the third equation so as to produce a zero in column 2, row 3. 
Continue until the entire second column is zero below the diagonal. 
Apply this procedure repeatedly until the entire area below the 
diagonal equals zero. This sort of matrix, with all zeros below the 
diagonal is called upper triangular. 

2. Back substitute. Take the transformed version of A and y and solve 
for X by 

yn^-^n.n 

^n —1 (Yn —1 n —l,n —l,n —1 

and so forth. 
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Let's look at what our sample system looks like in terms of matrices. We 
start with the original A and y. 



"4 

2 ■ 


‘18 ■ 

A = 

_2 

-2. 

y = 

_ 9 _ 


Now we begin the reduction process. Our first step multiplies the top 
row of A and y by -Vi and adds the result to the second row giving us 
new values of A and y. 



4 

2 


18 

A - 

_0 

-3 _ 

y = 

_0 


You can see why A is said to be in "triangular form." The non-zero 
entries form a triangle on and above the diagonal. 

Notice that A and y are changed. If you want to keep the original data 
intact, be certain to perform the reduction on a copy of the original 
matrices. 

The second step is to back-substitute. The matrix equation y = Ax still 
applies to the new versions of A and y. Starting at the bottom and working 
up we have 

0 = (0)xi -h (-3 )x2 

so X 2 = 0. Now we can substitute this into the first equation. 

18 = (4)xi -h (2)0 
Xi equals (18-0)/4, or 4.5. 

In theory, only one thing can go wrong with this procedure. Suppose 
that at some step the equation we are using to produce zeros below the 
diagonal has a zero as its own diagonal element. (This diagonal element 
is called the pivot element.) In this case, the equation cannot be used to 
eliminate the elements below it and the program stops. The solution to 
this problem is to exchange the offending equation with another so as 
to obtain a non-zero pivot. (Implementation of this solution is deferred 
until the next chapter.) If the entire column equals zero, the system of 
equations and the matrix A are said to be singular. The system of equations 
does not have a unique solution. 

This method of solving linear systems is called Gaussian elimination. 
While not the best computational method (better ones are introduced in 
the next chapter), it is the most straightforward. The following BASIC 
program implements Gaussian elimination. Notice that the original con¬ 
tents of A and y are replaced by transformed values. 
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S REM PROGRAM GAUSS 

10 DEFINT I-N 

20 DIM A(N-liN-l)-iY(N-l)-.X(N-l) 

2S REM BE SURE A-iY AND N ARE DEFINED 
30 FOR IE(2=0 TO N-1 

MQ IF A(IEfl-.IE(2)=Q THEN PRINT "ZERO PIVOT AT"-.IE(2:STOP 

SO FOR JR0li)=IE(2 + l TO N-1 

bO FACTOR=-A(JROU-iIE(3)/A(IE(2-.IE(3) 

70 Y(JR0lil)=Y(dR0IJ)+FACT0R*Y(IE(2) 

aO FOR K = IE(J TO N-1 

10 A(JR0U-iK)=A(JR0li)iK)+FACT0R*A{IE(3-iK) 

100 NEXT K 

110 NEXT JROU 

120 NEXT IZa 

130 REM 

mo REM A IS NOW UPPER TRIANGULAR 
ISO REM 

IbO X(N-1)=Y(N-1)/A(N-1,N-1) 

170 FOR IEt!=(N-l)-l TO 0 STEP -1 
laO SUM=0 

no FOR K=IE(3+1 TO N-1 

200 SUM=SUM + A(IE(J-.K)*X(K) 

210 NEXT K 

220 X(IEl 2 )=(Y(IE(a)-SUM)/A(IE(S-,IEiJ) 

230 NEXT IE(2 

How long does it take to solve a system using Gaussian elimination? 
The outermost loop, the lEQ loop, is executed N-1 times. The next loop, 
the JROW loop, is done N-1 times for the first lEQ, N — 2 for the second, 
and so forth. So the JROW loop is executed approximately nVl times. 
Tfie inner-most loop, K, executes N times per JROW for the first lEQ, 
N-1 times per JRCDW for the second lEQ, for a total of about n^/3 op¬ 
erations. In total, the time required to solve a system of n equations is 
proportional to n^/3, plus a small factor proportional to nVl. 

The logical next step would be to prepare 8087 routines to speed up 
the program. Since better solution methods are proposed in the next 
chapter, introduction of more 8087 routines will be postponed until that 
point. However, here are a couple of suggestions in case you'd like to 
experiment. 

Almost the entire execution time is spent in lines 80, 90, and 100. These 
lines multiply a row vector by a scalar and then add two row vectors. 
The routines prepared in Chapter 9 will only multiply and add column 
vectors. However, these routines could easily be modified to include a 
skip parameter, so as to work on row vectors. We might replace lines 
80, 90, and 100 with lines something like this: 

2S DIM XTRAR0U(N-1):I0NE=1 

as K=(N-l)-IE(a+l 

IS CALL MULTSC(A(IEiJ-.IE(S)-.FACT0R-.XTRAR0U(D)-.N-iI0NEiK) 

105 CALL VADD(A(JR0W-.IE(3)iXTRAR0W(0]iA(JR0W-.0]7NiI0NEiNiK) 
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Lines 180-200 might be replaced with GINPROD for some further gain. 

Using BASIC, solving a system of equations takes about 12500n^/3 
microseconds. Solving a 50-equation system uses over eight minutes of 
computer time. Replacing the inner-most loop with 8087 routines as sug¬ 
gested will reduce execution time to about lOOn^/3 microseconds. That 
knocks solution time for a 50-equation system down to about five sec¬ 
onds. 


Solving Multiple Linear Systems 

We frequently want to solve a number of linear systems, sharing a com¬ 
mon A matrix but having different y vectors. In place of a single n by 1 
column vector y, we can arrange m column vectors into an n by m matrix 
Y. The solutions can be placed in an n by m matrix X. The entire set of 
linear equations are represented in this way by the matrix equation 

Y = AX 

Examination of the Gaussian elimination routine shows that all the 
hard work, that is the order n^ work, involves only the A matrix. If we 
blindly apply the program above, execution time will be of order mn^. 
The revision below keeps the transformation of y and backsolving for x 
out of the innermost loop. 

10 DEFINT I-N 

SO Din A(N-liN-l)-.Y(N-l-.n),X(N-l-.M) 
as REM BE SURE A-iY-iN-. AND ri ARE DEFINED 
30 FOR IE(2=D TO N-1 

HD IF A(IE(2-.IEl3)=0 THEN PRINT "ZERO PIVOT AT"-iIE(2:ST0P 

SO FOR JR0U=IE(3+1 TO N-1 

bO FACT0R = -A(JR0lil-.IEfl)/A(IE(S->IE(J) 

70 FOR LE(3=0 TO n-1 

aO Y(JROU-.LE(2)=Y{JROU-iLE(5)+FACTOR»Y(IE(2-.LE(2) 

*10 NEXT LE(3 

100 FOR K=IE(J TO N-1 

110 A(dR0U-.K)=A(JR0liliK)+FACT0R*A(IE(2-iK) 

ISO NEXT K 

130 NEXT JROU 

mo NEXT IE(J 

ISO REN 

IbO REM A IS NOli) UPPER TRIANGULAR 
170 REH 

laO FOR LE(J=D TO n-1 
no X(N-1-.LE(2)=Y(N-1-.LE(3)/A{N-1-.N-1) 

SOD FOR IE(J=(N-1)-1 TO 0 STEP -1 

aio sun=a 

aao FOR K=IE(J+1 TO N-1 

a3D Sun=sun+A(IEC!-,K)*X(K-.LE(2) 

SHO NEXT K 

aso X(IE<JiLE(2)=(Y(IE(J-,LE(2)-SUn)/A(IEi3-.IE(J) 
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EtO NEXT IEi2 
270 NEXT LE(2 

This version of Gaussian elimination executes in time proportional to n^ 
plus mn^. For moderately large m and n, this means an improvement 
factor of roughly m over repeated Gaussian elimination! Notice that lines 
70-90 are ripe for replacement by 8087 row operations. 

Space Efficient Gaussian Eiimination 

The Gaussian elimination program above solves multiple linear systems 
quickly, but requires a great deal of storage, since 2*m*n locations are 
allocated for Y and X. We frequently want to solve systems sequentially, 
so that only a single y and x need be stored. 

Gaussian elimination transforms y. As you can see in lines 70-90 above, 
the same factors are used to transform every column of Y. If we save the 
factors, we can, at a later stage, transform as many different y's as we 
like. 

At each step in the reduction, all the (lower) elements of y are trans¬ 
formed. Suppose we save all the factors, labeling the factors from the 
first step foo, fio/ fzo/ and so forth. The second step produces one less 
factor. We label these fn, f 2 i, fsi, and so forth. Arranging the columns 
of factors into a matrix, we get 



The matrix of factors is lower triangular with ones along the diagonal. 
We need a convenient place to store F for later use. As we reduce A, the 
area below the diagonal fills with zeros. Since this lower part of A would 
otherwise go to waste, weTl use it to store the part of F below the diagonal, 
and remember that the remaining part of F is ones and zeros. 

Suppose we label the transformed vector y, "y*." The reduction process 
transforms y according to the following rules: 

y*o = yo 

y*i = yi + fioy*o 

y*2 = y2 + f2oy*o + f2iy*i 

y*3 = ys + fsoy* o + fsiy^^i + f32y*2 

and so forth. 
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The first step of our space efficient program is to reduce A to upper 
triangular form and store the factors in A's lower triangle. Then, each 
time we want to find x for a new y, we generate a new from the stored 
factors and back substitute. The next set of BASIC code takes this ap¬ 
proach. 

S REM PROGRAM GAUSS-SE (SPACE EFFICIENT) 

10 OEFINT I-N 

SO DIM A(N-1-.N-1)-.Y(N-1)-,X(N-1)-,YSTAR(N-1) 

25 REM BE SURE AiY-. AND N ARE DEFINED 

30 FOR IE(3=0 TO N-1 

MO IF A(IEia-.IE(3)=0 THEN PRINT "ZERO PIVOT AT"-,IE(3:STOP 

50 FOR JR0lil=IE(3 + l TO N-1 

bO FACT0R= -A(JROU-,IE(3)/A(IE(3-,IE(3) 

70 FOR K=IE(2 TO N-1 

fiO A(JR0lJ-.JCj=A(JR0li)-,K)+FACT0R*A(IE(3-.K) 

NEXT K 

100 A(JR0U-,IE(3)=FACT0R 

110 NEXT JROli) 

120 NEXT IE(3 

130 REM 

IMO REM A IS NOW UPPER TRIANGULAR 

150 REM 

IbO YSTAR(0)=Y(0) 

170 FOR IE(3=0 TO (N-l)-l 

IflO SUM=0 

no FOR K=0 TO IE(3-1 

200 SUM = SUM + A(IE(3 + lnK)*YSTAR(K) 

210 NEXT K 

220 YSTAR(IE(a + l)=Y(IE(3+l)+SUM 

230 NEXT IE(3 

2M0 X(N-1)=YSTAR(N-1)/A(N-1-,N-1) 

250 FOR IE(2=(N-1)-1 TO 0 STEP -1 

2b0 SUM=0 

270 F0R .K=IE(2 + 1 TO N-1 

2fl0 SUM = SUM + A(IE(3-,K)*X(K) 

2‘=i0 NEXT K 

300 X(IE(3)=(YSTAR(IE(3)-SUM)/A(IE(3-.IE(3) 

310 NEXT IE(3 

Lines 160-310 can be repeated for other y vectors as needed. Notice that 
lines 190-210 are really forming an inner product and could be replaced 
with 8087 code. 

In the next chapter, we will discuss more advanced methods of solving 
linear systems. 


Matrix Inversion 

Suppose we were faced with the scalar equation 
y = Ax 
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and were asked to solve for x given y. We might write the answer as 
X = y/A 

or as 


X = A'V 

For a scalar equation, Apronounced "A inverse," is just Va. The 
question arises as to whether there is not a matrix we could label "A"^," 
such that X = A‘^y. There is indeed such a matrix. 

First, define the identity matrix as a square matrix with ones along the 
diagonal and zeros off the diagonal. For example, if I is the 3 by 3 identity 
matrix, then 


I = 


1 

0 

0 


0 0 
1 0 
0 1 


The identity matrix is analogous to a one in scalar multiplication. The 
identities IX = X and XI = I hold for the identity matrix and any con¬ 
formable matrix X. For scalars, we say that A‘^ is the inverse of A if 
AA'^ = 1. Analogously, for matrices we say 

A'^ is the matrix inverse of A if AA'^ = I. 

(Note we are restricting our attention to square matrices. For a square 
matrix, not only does AA'^ = I, so does A‘*A.) 

How do we "invert" a matrix? The equation I = AA'^ has precisely 
the same form as the matrix equation Y = AX, where I is Y and A’’ is 
X. We can use our BASIC program above to reduce A to upper triangular 
form and then back substitute for each column of the identity matrix. 
Because of the special form of the identity matrix we can calculate y* 
without creating each y. 

Assume we have executed the reduction part of the previous program. 
The code below replaces lines 160 on, to calculate A'^ in AINV. 

Ito Din AINV{N-1-.N-1) 

170 FOR LE(S=D TO N-1 

IflO FOR IE(2=D TO LE(2-1 

no YSTAR(IE(J)=0 

200 NEXT IE(3 

210 YSTAR(LEfl)=l 

220 FOR IE(J=LE(J TO (N-l)-l 

230 sun=o 

2M0 FOR K=0 TO lEfl 

250 SUn=SUn+A(IE(2 + l-.l!:)*YSTAR(K) 

2t>0 NEXT K 

270 YSTAR(IE(2+l)=SUn 

280 NEXT lEfl 

2^0 AINV(N-1-,LE(3)=YSTAR(N-1)/A(N-1-.N-1) 

300 FOR IE(3=(N-1)-1 TO 0 STEP -1 
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310 SUt1=0 

3SQ FOR K=IE(S+1 TO N-1 

330 SUn=SUI1+A(IEiJ-,K)*AINV(K-,LE(J) 

3^0 NEXT K 

350 AINV(IEiJiLEl2)=(YSTAR(IEfl)-SUI1)/A{IE(J-,IE(J) 

3t0 NEXT IE(2 

370 NEXT LE(2 

We now have a "complete" set of routines to solve systems of linear 
equations and invert matrices. But these routines still leave a few things 
to be desired. 

• They stop if they hit a zero pivot. 

• They would be a lot faster if written in 8087 code. 

• They would be more accurate if higher precision arithmetic were 
used, but we do not want to sacrifice too much storage space. 

In the next chapter, we remedy these faults . . . and learn a few new 
tricks. 





Linear Systems and 
Matrix inversion: 
More Advanced 
Computationai 
Techniques 


By the end of the last chapter, we had created a set of procedures for 
solving systems of linear equations and for handling the related operation 
of matrix inversion. These methods followed the logic of "school room" 
techniques. The methods we develop in this chapter are perhaps less 
familiar, but they lend themselves well to highly acdurate and highly 
efficient 8087 implementation. 

Our goals for this chapter are: 

• Fix the "zero pivot" problem. 

• Express the solution to a system of linear equations in terms of inner 
products, in order to take full advantage of the 8087's design. 

• Move our procedures from BASIC to 8087 code. 


The Cookbook—Chapter 11 

Program: 

GAUSS-PP 

Purpose: 

Solve linear equations by Gaussian elimination with 


partial pivoting. 

Input: 

A—N by N coefficient matrix. 


Y—N vector. 
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Output: 


Language: 

Program: 

Purpose: 

Input: 

Output: 

Language: 

Program: 

Purpose: 

Input: 


Output: 

Language: 

Note: 

Program: 

Purpose: 

Input: 


Output: 

Language: 


Program: 

Purpose: 

Call: 

Input: 


YSTAR—N vector; scratch space. 

N—number of rows and columns of A. 

X—N vector; X solves equations Y=AX. 

A—A replaced with permuted Gaussian reduction. 
INDEX—N vector showing row swaps. 

BASIC. 

CROUT-PP 

Perform Crout decomposition with partial pivoting. 
A—N by N coefficient matrix. 

N—number of rows and columns of A. 

A—A replaced with permuted Gaussian reduction. 
INDEX—N vector showing row swaps. 

BASIC. 

PIV 

Perform pivot step in Crout decomposition. 

A—N by N coefficient matrix. 

INDEX—integer N vector of row permutations. 
TYPE A—integer giving length of element of A. 
DIAG—integer index of column to be searched. 
N—integer number of rows and columns of A. 
INDEX—updated to reflect new pivot. 

8087/8088 assembly language. 

NEAR procedure called by PIVOT and CROUTP. 

XINP 

Inner product with permuted column. 

A—N vector. 

B—permuted N vector. 

INDEX—integer N vector of row permutations for 
B. 

TYPE—integer giving length of element of A,B- 
SKIP A—integer "skip factor" (see text) for A. 

N—integer number of elements of A,B. 

8087 register ST; ST = inner product A,B- 
8087/8088 assembly language. 

Note: NEAR procedure called by XINPROD and 
CROUTP. 

XINPROD 

Inner product with permuted column. 
CALLXINPROD(A(L0),A(0,K),SUM,INDEX(0), 
TYPE,SKIPA,N). 

A—N vector. 

B—permuted N vector. 

INDEX—integer N vector of row permutations for 
B. 

TYPE—integer giving length of element of A,B. 
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Output; 

Language: 

Note; 

Program: 

Purpose: 

CaU: 

Input: 


Output: 

Language: 

Note: 

Program: 

Purpose: 

Input: 


Output; 

Language: 

Note: 

Program: 

Purpose: 

Call: 

Input: 


Output: 

Language: 

Note: 

Program: 

Purpose: 

Input: 


Output: 

Language: 


SKIP A—integer "skip factor" (see text) for A. 

N—integer number of elements of A,B. 

SUM—double precision scalar; sum = inner product 

A,B. 

8087/8088 assembly language. 

Requires NEAR procedure XINP. 

PIVOT 

Perform pivot step in Crout decomposition. 
CALLPIVOT(A(0,K),INDEX(0),TYPE,K,N). 

A—N by N coefficient matrix. 

INDEX—integer N vector of row permutations. 
TYPE—integer giving length of element of A. 

K—integer index of column to be searched. 

N—integer number of rows and columns of A. 
INDEX—updated to reflect new pivot. 

8087/8088 assembly language. 

Requires NEAR procedure PIV. 

CROUTP 

Perform Crout decomposition with partial pivoting. 
A—N by N coefficient matrix. 

TYPE—integer giving length of element of A. 

N—integer number of rows and columns of A. 
INDEX—integer N vector of row permutations. 
lER—integer error flag, IER= -1 if A singular. 
8087/8088 assembly language. 

NEAR procedure called by REDUCE. 

Requires NEAR procedures XINP and PIV. 

REDUCE 

Perform Crout decomposition with partial pivoting. 
CALLREDUCE(A(0,0),INDEX(0),TYPE,IER,N). 

A—N by N coefficient matrix. 

TYPE—integer giving length of element of A. 

N—integer number of rows and columns of A. 
INDEX—integer N vector of row permutations. 
lER—integer error flag, IER= —1 if A singular. 
8087/8088 assembly language. 

Requires NEAR procedures CROUTP. 

SOLP 

Solve system of linear equations after Crout decom¬ 
position with partial pivoting. 

A—N by N matrix reduced by CROUTP with partial 
pivoting. 

Y—N vector. 

N—number of rows and columns of A. 

INDEX—N vector showing row swaps. 

X—N vector; X solves equations Y = AX. 

BASIC. 
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Program: 

Purpose: 

Input: 


Output: 

Language: 

Note: 

Program: 

Purpose: 

Call: 

Input: 


Output: 

Language: 

Note: 

Program: 

Purpose: 

Call: 

Input: 


Output: 


Language: 

Note: 


SOL 

Solve system of linear equations after Crout decom¬ 
position with partial pivoting. 

A—N by N matrix reduced by CROUTP with partial 
pivoting. 

Y—N vector. 

INDEX—N vector showing row swaps. 

TYPEA—integer giving length of element of A. 
TYPEY—integer giving length of element of Y. 
TYPEX—integer giving length of element of X. 

N—integer number of rows and columns of A. 

X—N vector; X solves equations Y=AX. 

8087/8088 assembly language. 

NEAR procedure called by SOLVE. 

SOLVE 

Solve system of linear equations after Crout decom¬ 
position with partial pivoting. 

CALL SOLVE(A(0,0),Y(0),X(0),INDEX(0),TYPEA, 
TYPEY,TYPEX,N). 

A—N by N matrix reduced by CROUTP with partial 
pivoting. 

Y—N vector. 

INDEX—N vector showing row swaps. 

TYPEA—integer giving length of element of A. 
TYPEY—integer giving length of element of Y. 
TYPEX—integer giving length of element of X. 

N—integer number of rows and columns of A. 

X—N vector; X solves equations Y=AX. 

8087/8088 assembly language. 

Requires NEAR procedure SOL. 

INV 

Invert matrix. 

CALLSOLVE(A(0,0),AINV(0,0),SCRATCH(0), 

INDEX(0),IER,TYPEA,N). 

A—N by N matrix. 

SCRATCH—single precision N vector of scratch space. 
TYPEA—integer giving length of element of A,AlNV. 
N—integer number of rows and columns of A. 
AINV—N by N matrix; inverse of A. 

A—A replaced by Crout reduction. 

INDEX—integer N vector, permutations of reduced 
A. 

lER—integer error flag, IER= —1 if A singular. 
8087/8088 assembly language. 

Requires NEAR procedures CROUTP and SOL. 
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Our original program for Gaussian elimination stops if it hits a zero 
diagonal, or "pivot," element. A zero pivot may indicate a "mathemat¬ 
ical," not "just a computational," problem, because the system of equa¬ 
tions may not have a unique solution. Consider the following linear 
system as an example. 

yi = 2xi + 4x2 
y2 = 4 x] + 8x2 

The "A" matrix is originally. 

After one step of Gaussian elimination, the reduced matrix looks like 
this: 

In addition, we have saved the FACTOR, - 2. 

At the next attempted step of Gaussian elimination, the program finds 
that A 2,2 equals zero, and therefore stops with the error message "ZERO 
PIVOT AT 2." The problem is a mathematical one. The matrix A is 
singular, so the pair of equations does not have a unique solution. In 
fact, an infinite set of combinations of Xi and X 2 solve the system if y 2 is 
twice yi. No solution exists if y 2 isn't exactly twice yi. 

Consider the following rather different pair of equations. 

yi = OX] -I- 1x2 
yz = Ixi -f- 0x2 

The "A" matrix is originally 

[? J] 

Our Gaussian elimination routine hits a zero pivot—and stops—on 
the very first step. This example demonstrates a computational, rather 
than a mathematical, problem. By inspection, the solution to the system 
is Xi = Yz and X 2 = yi. The solution to the computational problem is 
simple. We just reorder the equations so that the diagonal elements aren't 
zero. Instead of solving the system as originally specified, we work on 

Yz = Ixi -I- 0 X 2 
yi = Oxi -I- 1x2 

with the "A" matrix 

[j ;] 

Gaussian elimination proceeds smoothly as long as we keep track of 
the order in which the equations are solved. 
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These two examples illustrate the general rules for dealing with zero 
pivot elements. 

1. If a zero pivot is encountered in equation i, exchange the equation i with 
an equation below it that does not have a zero in column i. 

2. If all the remaining equations have a zero in column z, the matrix is singular, 

3. Keep a record of all equation exchanges, so that we can later "unswap" the 
equations, if desired. 

In practice, we add two further steps. « 

4. Rather than actually exchanging the equations, create an array INDEX 
such that INDEX(I) is the original number of the equation that now belongs 
in row i. 

Just as a zero pivot stops the program by creating an infinite FACTOR, 
so too, a very small pivot creates a very large FACTOR and tends to 
generate sizable round-off error. Accuracy can be considerably enhanced 
by using the largest possible pivot. 

5. Instead of exchanging equations only when a zero pivot is encountered (as 
required by rule 1), search the remainder of column ifor the element, A(J,I), 
with the largest absolute value and exchange rows i and j. 

The implementation of Gaussian elimination with rules 2 through 5 is 
called Gaussian elimination with partial pivoting. The BASIC code below 
rewrites the Gaussian elimination program of the previous chapter to 
include partial pivoting. Notice that instead of row I, we now reference 
row INDEX(I), but column j remains column j. 

5 REn prograh gauss-pp 

lOD DEFINT I-N 

EDO DIM A(N-1-,N-1)-,Y(N-1)-.X(N-1)-.YSTAR(N-1)-. INDEX(N-l) 

EID FOR IE(3=D TO N-1 

EED INDEX(IE(3)=IE(3 1 

E30 NEXT IE(3 ' 

3DQ FOR IE(2 = 0 TO N-1 

310 REH NOli) SNAP ROWS 

3E0 GOSUB 5000 

330 SWAP INDEX(IEl3)nINDEX(IBIGGEST) 

3MD IE(3X = INDEX(IE(3) 

MOD IF A(IE(2XnIE(3)=0 THEN PRINT "SINGULAR nATRIX"-,IEfl:STOP 

SOD FOR JR0U=IE(3+1 TO N-1 

SIO JR0WX = INDEX(JR0W) 

bOD FACTOR=-A(JROWX-,IE(3)/A(IE(3X-.IE(3) 

700 FOR K=IE(2 TO N-1 

aOO A(JR0UX-.K)=A(JR0UX-. K)+FACT0R*A(IE(3X-.K) 

*1DD NEXT K 

IDDD A(JR0UX-iIE(3)=FACT0R 

1100 NEXT JROU 

lEDD NEXT IZa 

1300 REN 

moo REn A IS NOW UPPER TRIANGULAR 
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15D0 REM 

IbDO YSTAR(0)=Y(INDEX(Q)) 

170Q FOR IEfl=l TO N-1 

1710 IEl2X = INl>EX(IE(3) 

lAQO suii=a 

1100 FOR K=D TO IE(2-1 

SDQO SUI1=SUI1+A(IE(JX-iK)*YSTAR(K) 

aiOO NEXT K 

2200 YSTAR(IEfl)=Y(IE(JX)+SUI1 

2300 NEXT IE(3 

2310 IN=INI)EX(N-1) 

2^G□ X(N-l)=YSTAR(N-l)/A(INiN-l) 

2500 FOR IE(a=(N-l)-l TO 0 STEP -1 

251D IE(2X=IN])EX(IE(2) 

2boa sun=o 

27Q0 FOR K=IE(3 + 1 TO N-1 

2flDQ SUI1=SUI1+A(IElJX-.tC)*X(IC) 

2100 NEXT K 

30DD X(IE(2)={YSTAR(IEa)-SUI1)/A(IE(3X-,IE(J) 

3100 NEXT IE(3 

3200 STOP 

SOOO REM SUBROUTINE TO FIND LARGEST ELEilENT IN COLUMN 
5100 BIGGEST=ABS(A(INDEX(IE(3)iIE(3] 

5200 IBIGGEST=IE(3 

5300 FOR I=IE(3+1 TO N-1 

5M00 PIV=ABS(A(INDEX(I]-.IE(3)] 

5500 IF PIV>BIGGEST THEN BIGGEST=PIV:IBI6GEST=I 

5b00 NEXT I 

5700 RETURN 


This program performs the same number of multiplications and ad¬ 
ditions as simple Gaussian elimination, but will run a little more slowly 
due to increased overhead. The time spent selecting pivot rows is an 
order n^ operation, and is therefore negligible compared to the basic 
reduction operation. 

We've fixed the "zero pivot" problem. Before moving on to the chap¬ 
ter's other goals, we need to discuss some more mathematics. If you're 
more interested in "how" than "why," skip ahead to the programs. It 
will help to look at the BASIC programs before the 8087 programs, since 
the former are easier to follow. 

The Theory of “LU Decomposition” 

A number of advanced methods of solving systems of linear equations, 
and consequently of matrix inversion, rely on the principle of "LU de¬ 
composition." This principle states that a square matrix A can be factored 
into a lower triangular matrix L and an upper triangular matrix U such 
that L times U equals A. There are many such decompositions. A par- 
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ticular method is arrived at by our choice of restrictions on the contents 
of L or U. 

LU methods all work in three steps. Suppose the initial problem is 
y = Ax 

1. Factor A into L and U. This decomposition is an order n^ operation. 
Now we have 

y = (LU)x = L(Ux) 

While LU is a square matrix, Ux is a column vector. 

2. Solve the following system of equations for y*. Solution of an upper 
(or lower) triangular system of equations is an order n^ operation. 

y = Ly* 

Since y"^ = L'^y, we next 

3. Solve, the order n^ problem 

y* = Ux 

Does this look like a roundabout method? It really isn't, it only seems 
that way. For example, the reduction process of Gaussian elimination 
leaves us with an upper triangular matrix that we might call U. If we 
add a diagonal with all ones to the set of factors we store along the way, 
we have a lower triangular matrix that we might call L. A bit of calculation 
will show you that LU indeed equals A. Further, solving for y* and x in 
the Gaussian elimination programs are exactly steps 2 and 3 above. So 
Gaussian elimination is actually a particular example of using an LU 
decomposition. 

The Crout Decomposition 

The most useful LU method for the 8087 is called the Crout decomposition. 
(The Crout decomposition is a member of the family called “compact" 
methods.) The defining characteristic of the Crout decomposition is that 
U has all ones along the diagonal. So this LU decomposition looks like 
this: 
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1 Uo4 Uo,2 

0 1 Ui,2 

0 0 1 


Uo,N-l 

Ui,N-l 


Lo 0 


Un-2,N-1 

0 1 


Notice the special pattern in which the rows of L match up with the 
columns of U. The inner product of row 0 of L and column 0 of U, which 
equals Ao,q, is just Lo,o. Moving down to the second row of L we see 
that Li,o equals Ai,o, and so forth. In this way, the entire first column 
of L is defined. 


Now multiply the first row of L with the second column of U. We find 
Lo,o times Uo,i equals Ao,i. Since we already know Lo,o, we can solve for 
Uo,i directly. Moving on to the third column of U gets us Uo ,2 in the 
same manner. In this way the entire first row of U is defined. 

Having defined the first column of L and the first row of U, we move 
on to the second column of L and the first row of U. In effect, the Crout 
procedure marches down the diagonal of the matrix. At each step, the 
portion of the column of L hanging down from the diagonal and the 
portion of the row of U sticking out to the right, are defined. To conserve 
space, we reuse A to store L and U. The following BASIC program 
performs a CROUT decomposition. 

ID DEFINT I-N 

BD DEFDBL S 

3D Din A(N-1-,N-1) 

^D FOR K=D TO N-1 

SD IF A(K-,K)=D THEN PRINT "ZERO PIV0T"-,K:STOP 

LD REM FILL IN COLUMN OF L 

7D FOR I=IC TO N-1 

fiD sun=D 

ID FOR L=D TO K-1 

IDD SUn=SUM+A(I-.L)*A(L-iK) 

IID NEXT L 

IBD A(I-,K)=A(I-.K)-SUM 

13D NEXT I 

IMD REM FILL IN ROW OF U 

ISD FOR J=K+1 TO N-1 

ILD sun=D 

17D FOR L=D TO K-1 

IflD SUn=SUn+A(K-.L)*A(LiJ) 

nD NEXT L 

BDD A(K-, J)=(A(K-i J)-SUn)/A{K-,K) 

BID NEXT J 

BBD NEXT K 
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Do you see why the Crout decomposition is so well suited to the 8087? 
Lines 80-110 and 160-190 form inner products between a portion of a 
column of L and a portion of a row of U! By using INPROD or GINPROD 
we take full advantage of the 8087's speed. Of probably greater impor¬ 
tance, since the inner products are accumulated in temporary real pre¬ 
cision, we can store a matrix in single or double precision and still get 
almost all the accuracy of 80-bit storage. 


If you'd like to get a better handle on the logic of the Crout decom¬ 
position, you might try reducing the following 2 by 2 matrix into upper 
and lower triangular matrices. 



You should end up with these two matrices: 



"3 

0 


1 

2 

L = 



U = 




_4 

-6_ 


_0 

1 _ 


Note that our program stores both L and U in place of A. 


A reduced = 


3 

4 


2 

-6 


The "zero pivot" problem has returned with this version of Crout 
decomposition. We adapt this program to include partial pivoting by 
exchanging rows just before filling each row of U. The next BASIC pro¬ 
gram illustrates Crout decomposition with partial pivoting. 

S REM PROGRAM CROUT-PP 

100 DEFINT I-N 
SOO DEFDBL S 

300 DIM A(N-liN-l)-.INDEX(N-l)-.Y(N-l) 

310 FOR 1=0 TO N-1 

330 INDEX(I)=I 

330 NEXT I 

HOO FOR K=0 TO N-1 

too REM FILL IN COLUMN OF L 

700 FOR I=IC TO N-1 

710 IX = INDEX(I) 

aOO SUM=0 

100 FOR L=0 TO K-1 

110 LX=INDEX(L) 

1000 SUM=SUM+A(IX-,L)*A(LX-.K) 

1100 NEXT L 

1200 A(IX-.K)=A(IX-,K)-SUM 

1300 NEXT I 

1310 REM ShlAP ROUS 
1320 GOSUB 5000 
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1330 SWAP INDEX(K)-.INI>EX(KBIGGEST) 
moo REM FILL IN ROW OF U 
miD KX = INDEX(K) 

mao IF A(KX*.K)=D THEN PRINT '^SINGULAR NATRIX^nK: STOP 
ISOD FOR J=K+1 TO N-1 

ibDD sun=o 

1700 FOR L=0 TO K-1 

171D LX = INDEX(L) 

IflDD SUn=SUn + A(KX-.L)*A(LX-.J) 

noo NEXT L 

EDDD A(KX-, J)=(A(KX-.J)-SUm/A(KX-.K) 

EIDD NEXT J 

EEDD NEXT K 
3DDQ STOP 

SDOO REM SUBROUTINE TO FIND LARGEST ELEMENT IN COLUMN 
SIOD BIGGEST=ABS(A(INDEX)(K)-.K)) 

SEDD KBIGGEST=K 

5300 FOR I=K+1 TO N-1 

SMDO PIV = ABS(A(INDEX)(I)-.K)) 

SSDD IF PIV>BIGGEST THEN BIGGEST=PIV:KBIGGEST=I 

5bDD NEXT I 

S7Q0 RETURN 

This program effectively takes the original A, permutes A by swapping 
rows as indicated in INDEX, and then replaces A with the Grout decom¬ 
position of the permuted A. In the solution phase, weTl have to undo 
the row swaps. 

8087 Routines for Solving Systems 
of Linear Equations 

The time has finally arrived to prepare high-speed 8087 routines for 
solving systems of linear equations and for matrix inversion. Three rou¬ 
tines based on the Crout decomposition with partial pivoting appear 
below. REDUCE reduces a matrix to its LU decomposition. Given the 
reduced matrix and the vector y, SOLVE calculates x, as in y = Ax. INV 
inverts a matrix in one step. 

For maximum flexibility, we write a series of 8087 internal procedures, 
and then add external procedures that may be called from BASIC. Our 
first procedure, PIV, finds the largest element of a column, indexed by 
INDEX, and exchanges indexes to accomplish partial pivoting. 

^SUBROUTINE PIV(A-.INDEX-.TYPEAiDIAGiN) 

^ ASSUMPTIONS: A IS ADDRESS OF N-ARRAY IN DATA SEGMENT 
^ INDEX IS AN INTEGER N-ARRAY 

TYPEAiDIAGiN ARE INTEGERS 

; NOTE THIS PROCEDURE CANNOT BE CALLED FROM 

BASIC 

i IT FINDS ITS ARGUMENTS ON THE STACK 

; NOT THEIR ADDRESSES 



154 8087 Applications and Programming 


THERE MUST BE AT LEAST 2 FREE LOCATIONS ON 
THE flOfl? STACK AND AT LEAST m FREE BYTES ON 
THE MEMORY STACK 

THE LAST 2 WORDS OF LOCAL-SPACE MUST ALSO BE 
FREE 

PIV A SEARCHES FROM DIAG 
TO THE BOTTOM FOR THE ELEMENT LARGEST IN 
ABSOLUTE VALUE. PIV EXCHANGES THE INDEXES OF 
DIAG AND THIS ELEMENT 

A IS AN N-VECTOR PERMUTED ACCORDING TO INDEX 


ASSUME 

CS:CSEG-.ES:ESEG 


PIV PROC 

NEAR 


PUSH 

BP 


MOV 

BP-.SP 


nSINCE THIS IS A 

NEAR PROCEDURE-. ARGUMENTS BEGIN AT [BP]-m 

PUSH 

AX 


PUSH 

BX 


PUSH 

CX 


PUSH 

DX 


PUSH 

SI 


PUSH 

DI 


MOV 

CX-.[BP]+M 

nCX = N 

MOV 

SI-.[BP]+1B 

^SI = ADDR(A) 

MOV 

DI-.[BP]+1Q 

nDI=ADDR(INDEX) 

MOV 

DX-.[BP]+fci 

•.DX=DIAG 

ADD 

DIiDX 


ADD 

DI-iDX 

;DI=ADDR(INDEX(DIAG)) 

KBIGGEST E(3U LOCAL_SPACE_LAST-B 

^KEEP BIGGEST FOUND 

MOV 

KBIGGESTiDI 

^ASSUME FIRST IS 
BIGGEST 

SUB 

CX-.DX 

•.CX=N-DIAG*, 

DEC 

CX 

1 #0F ELEMENTS TO 



CHECK 

MOV 

AX-.[BP]+a 

UX=TYPEA 

MUL 

WORD PTR [DI] 

;AX=TYPEA*INDEX{DI) 

MOV 

BXnAX 


CMP 

WORD PTR [BPl+fl-.^ 

nIS A SINGLEf 

JNE 

A_DOUBLE 


FLD 

DWORD PTR [SI][BX] 

hLoad single 

JMP 

LOADED-ONE 


A_DOUBLE: FLD 
LOADED-ONE: 

FABS 

(3W0RD PTR [SI][BX] 

•.LOAD DOUBLE 

JCXZ 

COMP-LOOP: 

DONE 


ADD 

DlnS 

^NEXT INDEX 

MOV 

AX-.[BP]+fl 

UX = TYPEA 
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MUL 

UORO PTR [01] 

•.AX=TYPEA*INOEX(OI) 

MOV 

BXiAX 


CMP 

UORO PTR [BP]+fi-,M 

ilS A SINGLEf 

JNE 

AI-OOUBLE 


FLO 

OlilORO PTR [SI][BX] 

aOAO SINGLE 

JMP 

COMPARE 


AI-OOUBLE: FLO 

(3lil0R0 PTR [SI][BX] 

SLOAO DOUBLE 

COMPARE: 



FABS 



FCOM 


■.COMPARE NEU TO 



BIGGEST 

STATUS-UORO E(3U 

LOCAL-SPACE-LAST-M 

•.SCRATCH SPACE 

FSTSU 

STATUS-UORO 


FUAIT 



MOV 

AH-,BYTE PTR STATUS-UORO + 1 

SAHF 



JB 

LESS-OR-NONCOMPARABLE 

iHERE IF NEW IS 

GREATER THAN OR EflUAL 

TO BIGGEST 

FSTP 

ST(1) 

;nOVE NEIi) DOWN STACK 

MOV 

KBIGGEST-.OI 


JMP 

NEXT-ELEMENT 



LESS-OR-NONCOMPARABLE: 

FSTP ST(0) 


nBIGGEST IS STILL 
CHAMP 


NEXT-ELEMENT: 

LOOP 

^SUAP INDEX(DIAG) 
MOV 
MOV 
ADO 
ADO 
MOV 
MOV 

XCHG 

XCHG 

DONE: 

FSTP 

POP 

POP 

POP 

POP 

POP 

POP 

POP 

RET 

PIV ENOP 


COMP-LOOP 

AND INOEX(KBIGGEST) 
OX-.[BP]+fc. 
0I-,[BP]+1Q 
OI-.OX 
DI-.OX 
AX-.IOI] 
BXiKBIGGEST 

AX-.[BX] 

AX-,[OI] 

ST(Q) 

01 

SI 

OX 

CX 

BX 

AX 

BP 

ID 


JiOX = OIAG 

^OI = AOOR(INOEX) 

*.OI = AOOR(INOEX(OIAG)) 
iAX = IN0EX(0IA6) 
nBX = AOOR(INOEX 
(KBIGGEST)) 


nCLEAR ELEMENT OFF 
STACK 


Procedure PIV takes roughly 80 microseconds per element. When used 
for partial pivoting, PIV searches n, n-1, n-2, and so forth elements at 
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successive calls. Quite roughly then, in solving a system of n equations, 
we spend about SOnVl microseconds in PIV. For a large matrix, PIV might 
take up half a second. If you'd like an exercise in array addressing tech¬ 
niques, rewrite PIV replacing the MUL instruction in COMP_LOOP with 
appropriate SHL (SHift Left) instructions. You should be able to speed 
up PIV by about 25 percent. 

Most of the work of the Grout decomposition is a series of inner prod¬ 
ucts. Unfortunately, the rows are permuted according to INDEX, so we 
can't use procedure GINP. GINP assumes that columns are stored se¬ 
quentially, while ours are "scrambled." Procedure XINP does an inner 
product with indexed columns. 


•.SUBROUTINE XINP(A(I.D)-.A(D-, J). INDEX .TYPE .SKIP A. N) 

^ ASSUMPTIONS: A(I.O) IS THE ADDRESS OF ROld I 

A(D.J) IS THE ADDRESS OF COLUMN J 
INDEX IS THE ADDRESS OF INTEGER ARRAY INDEX 
^ TYPE.SKIPA.N ARE INTEGERS 

NOTE THIS PROCEDURE CANNOT BE CALLED FROM 
BASIC 

^ IT FINDS ITS ARGUMENTS ON THE STACK 

NOT THEIR ADDRESSES 

THERE MUST BE AT LEAST B FREE LOCATIONS ON 
; THE flOfl? STACK AND AT LEAST m FREE BYTES ON 

THE MEMORY STACK 


XINP RETURNS THE INNER PRODUCT OF THE FIRST N 
ELEMENTS OF ROW I AND COLUMN J ON THE 
aOfl? STACK 


XINP TAKES EVERY SKIPA ELEMENT OF A(I.D) AND 
INDEXES THE ELEMENTS OF A(D.J) 


XINP 


ASSUME 

CS:CSEG 


PROC 

NEAR 


PUSH 

BP 


MOV 

BP.SP 


S IS A 

NEAR PROCEDURE. 

ARGUMENTS BEGIN AT [BP]+ 

PUSH 

AX 


PUSH 

BX 


PUSH 

CX 


PUSH 

DX 


PUSH 

SI 


PUSH 

DI 


FLDZ 


^SET RUNNING SUM=0 

JCXZ 

DONE 


MOV 

SI.[BP]+m 

^SI = ADDR(A(I.D)) 

MOV 

DI.[BP]+1D 

;DI = ADDR(INDEX) 


•.IF TYPE IS SINGLE PRECISION SET CX = B ELSE SET CX=3 
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i USE CX FOR SHIFTING BELOU 


nov 

CX-.3 

USSUnE TYPE DOUBLE 

CMP 

UORD PTR [BP]+fl-.^ 

ilS TYPE SINGLEf 

JNE 

NOT-SINGLE 


nov 

cx.a 

lYESi TYPE IS SINGLE 

NOT-SINGLE: 



nov 

AX-.[BP]+ti 

;ax=skip 

SHL 

AX-.CL 

nAX=A ELEnENT DISTANCE 

cnp 

UORD PTR [BPl+MnO 

;n LE Of 

JLE 

DONE 


XINP-LOOP: 



;first get ready 

FOR COLUnN 


nov 

BX-,[DI1 

iBX=INDEX(L) 

SHL 

BXnCL 

•.BX=TYPE*INDEX(L) 

ADD 

BX-,[BP1+1S 

•nBX=A(INDEX(L)-,L) 

cnp 

CL-.a 

nIS A SINGLEf 

JNE 

A-DOUBLE 


FLD 

DUORD PTR [SI] 

aOAD SINGLE ROU 



ELEnENT 

jnp 

nULT_B 


A-DOUBLE: FLD 

(3U0RD PTR [SI] 

nLOAD DOUBLE COL 



ELEnENT 

nULT_B: 



cnp 

CL-.2 

as A SINGLEf 

JNE 

A-DOUBLEB 


FnUL 

DUORD PTR [BX] 

•.nULTIPLY SINGLE 

jnp 

NEXT-ELEnENT 


A_D0UBLEE: FnUL 

(3U0RD PTR [BX] 

•.nULTIPLY DOUBLE 

NEXT-ELEnENT: 



FADDP 

ST(1)-,ST 

aun=sun+ROU(L)*coL(L) 

ADD 

SI-.AX 

•iNEXT ROU ELEnENT 

ADD 

DI-.B 

nNEXT INDEX 

DEC 

UORD PTR [BP]+M 

aECREnENT C0UNT(N0TE N 

cnp 

UORD PTR [BP]+M-.D 

1 UAS IN TEnP LOCATION) 

JG 

XINP-LOOP 


DONE: 



POP 

DI 


POP 

SI 


POP 

DX 


POP 

CX 


POP 

BX 


POP 

AX 


POP 

BP 


RET 

la 



XINP ENDP 


Like most of our inner product routines, XINP uses about 59 micro¬ 
seconds per element. Notice that we went to the trouble of using the 
shift rather than the multiply. It actually takes the 8088 longer to multiply 
two integers than it takes the 8087 to multiply single precision numbers. 
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If we multiplied rather than shifted, the 8087 would have to wait idly 
while the 8088 calculated the next address. 

Essentially all of the hard computational work of the Crout decom¬ 
position is done by XINP. Because it is a NEAR procedure, XINP cannot 
be called directly from BASIC. Procedure XINPROD, below, is a FAR 
procedure, callable from BASIC, that calls XINP for us and then returns 
the inner product as a double precision result. It's also convenient to be 
able to use PIV, even though PIV is only called n times as compared to 
the n^ calls to XINP, so we also include a FAR procedure, PIVOT. 

^SUBROUTINE XINPR0I>(A(IA(0-,J)-,SUM-,INDEXiTYPEiSKIPA-iN) 
i ASSUMPTIONS: A(IiD)-.A(D-.J)-,INDEX ARE ADDRESSES TO BE PASSED 
; TO XINP 

TYPEiSKIPAiN ARE ADDRESSES OF INTEGERS WHOSE 
; VALUES SHOULD BE PASSED TO XINP 

^ XINP RETURNS THE RESULT ON THE TOP OF STACK 

11 

n IT SHOULD BE PLACED IN DOUBLE PRECISION SUM 

XINP CALLS THE INTERNAL SUBROUTINE XINP 



PUBLIC 

XINPROD 


ASSUME 

CS:CSEGnES:ESEG 

XINPROD 

PROC 

FAR 


PUSH 

BP 


MOV 

BP.SP 

iSET UP 

STACK AREA 

IN ESEG 


PUSH 

ES 


CALL 

NEXT 

NEXT: 

POP 

AX 


SUB 

AX-,(0FFSET NEXT)-(OFFSET FIRST_INST) 


MOV 

CL.M 


SHR 

AXnCL 


MOV 

BX-.CS 


ADD 

BX.ESEG 


SUB 

BXnCSEG 


ADD 

AX-.BX 


MOV 

ESnAX 

1 

MOV 

LOCAL-SPACEiSS 


MOV 

LOCAL-SPACE+SiSP 


MOV 

ax-.es 


MOV 

SS-.AX 


MOV 

SPiOFFSET STACK-TOP 


;SET UP CALL PARAMETERS 


PUSH 

DS:[BP]+lfl 

JiADDR(A(I-,0)) 

PUSH 

DS:[BP]+lt 

iADDR(A(D-,J)) 

PUSH 

DS:[BP]+ia 

UDDR(INDEX) 

MOV 

BXiDS:[BP]+lD 

^BX=ADDR(TYPE) 

PUSH 

[BX] 

iTYPE 
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MOV 

BX-,I)S:[BP]+a 

nBX=ADI)R(SKIPA) 


PUSH 

[BX] 

nSKIPA 


MOV 

BX-.l)S:[BP]+t, 

;BX=A])DR(N) 


PUSH 

[BX] 

iN 


CALL 

XINP 



nov 

SP-.LOCAL-SPACE + a 


MOV 

SS.LOCAL-SPACE 



nov 

BX-,[BP]+m 

^BX=ADDR(SUI1) 


FSTP 

(JlilORO PTR [BX] 

iSTORE sun 


POP 

ES 



POP 

FUAIT 

BP 



RET 

m 


XINPROD 

ENOP 




•^SUBROUTINE PIV0T(A(0-,K) 

tINI>EX-.TYPE-,K-.N) 

ASSUnPTIONS: A(IINDEX ARE ADDRESSES TO BE PASSED 

T 

TO PIVOT 


TYPE-.K-.N 

ARE ADDRESSES OF INTEGERS WHOSE 

T 

VALUES 

SHOULD BE PASSED TO XINP 

; PIVOT 

CALLS THE INTERNAL SUBROUTINE PIV 


PUBLIC 

PIVOT 


ASSUME 

CS:CSEGtES:ESEG 

PIVOT 

PROC 

FAR 


PUSH 

BP 


MOV 

BP-.SP 

•^SET UP 

STACK AREA IN 

ESEG 


PUSH 

ES 


CALL 

NEXT 

NEXT: 

POP 

AX 


SUB 

AX.COFFSET NEXT]-(OFFSET FIRST_INST) 


MOV 

CL.M 


SHR 

AXnCL 


MOV 

BXtCS 


ADD 

BX.ESEG 


SUB 

BXnCSEG 


ADD 

AX-.BX 


MOV 

ESiAX 


MOV 

LOCAL-SPACEiSS 


MOV 

LOCAL-SPACE+StSP 


MOV 

AXiES 


MOV 

SS-.AX 


MOV 

SP-.OFFSET STACK-TOP 
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nSET UP CALL PARAMETERS 



PUSH 

I>S:[BP]+m 

iAl)DR(A(D*.K)) 


PUSH 

DS:[BP]+12 

Ul)DR(INDEX) 


MOV 

BX-.DS:[BP]+1D 

nBX=AI)DR(TYPE) 


PUSH 

[BX] 

iTYPE 


MOV 

BX-,I>S:[BP]+fl 

;BX = AI)1)R(K) 


PUSH 

[BX] 

nK 


MOV 

BXnDS:[BP]+t. 

iBX = A])l)R(N) 


PUSH 

[BX] 

iN 


CALL 

PIV 



MOV 

SP-.LOCAL_SPACE + E 


MOV 

SS-iLOCAL-SPACE 



POP 

ES 



POP 

BP 



RET 

ID 


PIVOT 

ENOP 




With XINPROD and PIVOT in hand, we need only replace the appro¬ 
priate lines of the BASIC program with CALL statements. The new ver¬ 
sion of the BASIC program appears below. 

100 DEFINT I-N 
2DQ DEFDBL S 

3DD DIM A(N-1-,N-1)-,INDEX(N-1) 

310 FOR I=D TO N-1 

3SD INDEX(I)=I 

330 NEXT I 

3^D ITYPE=M 

HDD FOR K=D TO N-1 

bDD REM FILL IN COLUMN OF L 

7DD FOR I=K TO N-1 

71D IX = INDEX(I) 

ADD SUM=D 

^DD REM FOR L=D TO K-1 

11D REM LX=INDEX(L) 

IDDD REM SUM = SUM + A(IX-.L)*A(LX-.K) 

HDD REM NEXT L 

USD CALL XINPR01>(A(IX-.D)-.A(D-.K)-.SUM-.IN])EX(D)iITYPE-.N-,K) 

lEDD A(IX-.K)=A(IX-.K)-SUM 

13DD NEXT I 

131D REM SWAP ROUS 

132D REM GOSUB 5DDD 

133D REM SUAP INI>EX(K)-.INDEX(KBIGGEST) 

13SD CALL PIVOT(A(DnK)-.INDEX(D)-.TYPE-.K-iN) 

IHDD REM FILL IN ROU OF U 
miD KX = INI)EX(K) 

m2D IF A(KX-,K)=D THEN PRINT "SINGULAR MATRIX"-,K: STOP 
ISDD FOR J=K+1 TO N-1 
ILDD SUM=D 

17DD REM FOR L=D TO K-1 

171D REM LX=INDEX(L) 



11 Q Linear Systems and Matrix inversion 161 


IflDD REM SUn = SUn+A(KX-.L)*A(LX-,J) 

noo REn NEXT L 

nSD CALL XINPROD(A(KXiD)-.A(D-.J)iSUI1-,INDEX(a)-.ITYPE-.NnK) 

aOQO A(KX-.J)=(A(KXiJ)-SUn)/A(KX-.K) 
aiDO NEXT J 

EEDO NEXT K 

SODO REM SUBROUTINE TO FIND LARGEST ELEMENT IN COLUMN 
SIQD REM BIGGEST=ABS(A(INDEX(K)nK)) 

SEQD REM KBIGGEST=K 

5300 REM FOR I=K+1 TO N-1 

SMDO REM PIV=ABS(A(INDEX(I)-.K)) 

5500 REM IF PIV>BI6GEST THEN BIGGEST=PIV:KBIGGEST=I 

5b00 REM NEXT I 

5700 REM RETURN 

XINP does most of the hard work of Crout reduction. For large n, most 
execution time is spent doing the inner products, so the BASIC code 
above is quite efficient. Lines 700-1300 and 1500-2100 are executed n^ 
times. For moderate n, these lines may take up a substantial amount of 
time. In procedure CROUTP we put everything together into an 8087 
program for Crout reduction with partial pivoting. 

^SUBROUTINE CR0UTP(A-.INDEX-.lER-.TYPE-.N) 
n ASSUMPTIONS: A IS THE ADDRESS OF AN N BY N MATRIX 
n INDEX IS THE ADDRESS OF AN INTEGER N-ARRAY 

1 lER IS THE ADDRESS OF AN INTEGER 

^ TYPEiN ARE INTEGERS 

n NOTE THIS PROCEDURE CANNOT BE CALLED FROM 

BASIC 

; IT FINDS ITS ARGUMENTS ON THE STACK 

n NOT THEIR ADDRESSES 

n 

; CROUTP REPLACES A UITH THE CROUT LU 

DECOMPOSITION 

OF THE PERMUTATION OF A RETURNED IN INDEX 

AT EXIT IER=-1 IF MATRIX IS SINGULAR-, ELSE 
IER=Dn 

ASSUME CSrCSEG 

CROUTP PROC NEAR 

PUSH BP 

MOV BPiSP 

;SINCE THIS IS A NEAR PROCEDURE-, ARGUMENTS BEGIN AT [BP]+M 


PUSH 

AX 

iTHESE ARE 
UNNECESSARY 

PUSH 

BX 


PUSH 

CX 


PUSH 

DX 


PUSH 

SI 


PUSH 

DI 

nBUT GOOD FORM 
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MOV 

BXi[BP]+a 

iBX=ADDR(IER) 

MOV 

WORD PTR [BX]-,D 

aER=a 

iFIRST SET INPEX(I)=I 

MOV 

CX-.[BP]+4 

nCX=N 

MOV 

DI-.[BP]+1D 

iDI=ADDR(INDEX) 

MOV 

AX-.D 


INDEX-LOOP: 

MOV 

[DIl.AX 

aNDEX(I)=I 

INC 

AX 


ADD 

DI-.B 


LOOP 

INDEX-LOOP 


1 

aF TYPE IS SINGLE PRECISION SET CX=S 

ELSE SET CX=3 

1 USE CX FOR SHIFTING BELOW 


MOV 

CX-.3 

nASSUME TYPE DOUBLE 

CMP 

WORD PTR [BPl+t.M 

MS TYPE SINGLE? 

JNE 

NOT-SINGLE 


MOV 

ru 

r 

X 

MES-. TYPE IS SINGLE 

NOT_SINGLE: 

^OUTERMOST LOOP IS FOR K=0 TO N-1 


MOV 

1 

SInO 

MEEP K IN SI 

1 

MAJOR-LOOP: 

MOV 

AX-.[BP]+M 

UX=N 

MUL 

SI 

UX=N*K 

SHL 

AX-.CL 

nAX=TYPE*N*K 

MOV 

DXnAX 


ADD 

DX-.[BP]+ia 

MX=ADDR(A(Q-.K)) 

nFILL IN COLUMN OF L 


inOVE THROUGH INDEXd) FOR I=K TO N-1 


COUNT E(3U LOCAL-SPACE + M 

^SCRATCH SPACE FOR 



COUNTS 

MOV 

AX-.[BP]+M 

MX=N 

SUB 

AX-.SI 

UX=N-K 

MOV 

COUNTnAX 

MOUNT=N-K 

MOV 

DI-.[BP]+1Q 

•.DI=ADDR(INDEX(D)) 

ADD 

DI-.SI 


ADD 

DlnSI 

MI=ADDR(INDEX(I)) 

L-LOOP: 



MALL XINP(A(INDEX(I) 

• 

A(a-.K)nINDEX-.TYPE 

nN-.K) 

MOV 

BXi[DI] 

iBX=INDEX(I) 

SHL 

BX-.CL 

iBX=BEGINNING OF ROW 

ADD 

BXi[BPl+ia 

MX=ADDR(A(INDEX(I)-,D)) 

PUSH 

BX 


PUSH 

DX 

MDDR(A(Q-.K)) 

PUSH 

[BPl+lD 

MDDR(INDEX) 

PUSH 

[BP]+fc, 

MYPE 
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PUSH 

[BPJ+M 

^N 

PUSH 

SI 

\K 

CALL 

XINP 


nGET ADDRESS OF 

A(INDEX(I)-.K) 


MOV 

BX-.[DI] 

;BX = INDEX(I) 

SHL 

BX-rCL 

;bx=type*k 

ADD 

BXnDX 

•.BX=ADDR(A(INDEX(I)iK}) 

^CALCULATE A(INDEX(I)-.K)-SUI1 FOR SINGLE 

OR DOUBLE PRECISION 

CMP 

cx.a 

aiNGLEf 

JNE 

A_DOUBLE 


FSUBR 

DWORD PTR [BX] 

aT=A(INDEX(I)nK]-SUM 

FSTP 

DWORD PTR [BX] 

U(INDEX(I)-.K)=ST 

jnp 

NEXT-COL-ELEMENT 


A_DOUBLE: FSUBR 

(3WORD PTR [BX] 

aT=A(INDEX(I)*,K}-SUM 

FSTP 

(3WORD PTR [BX] 

a(INDEX(I]iK}=ST 

JUP 

NEXT-COL-ELEMENT 


1 

NEXT-COL-ELEMENT: 


ADD 

Di.a 

iNEXT INDEX(I) 

DEC 

COUNT 


CMP 

C0UNT-.0 


JG 

n 

L-LOOP 


T 

;CALL PIV(A(DiK)-. 

INDEX-.TYPE-.K-.N) 


PUSH 

DX 

UDDRCAfOiK)) 

PUSH 

[BP]+1Q 

UDDR INDEX 

PUSH 

[BP]+t, 

aYPE 

PUSH 

SI 

\K 

PUSH 

[BP]+M 

nN 

CALL 

PIV 


:i******j|c*CHECK 1 

^OR SINGULAR MATRIX 


as A(INDEX(K)-,K)= 

= Df f f 


n 

STATUS-lilORD E(2U 

LOCAL-SPACE+t, 


\ 

MOV 

DI-,[BP]+1D 

iDI=ADDR(INDEX) 

ADD 

DI-.SI 


ADD 

DI.SI 

ai=ADDR(INDEX(K)) 

nov 

DI-.[DI] 

ai = INDEX(K) 

SHL 

DInCL 

•.DI=TYPE*INDEX(K) 

ADD 

DI-.DX 

ai=ADDR(A(INDEX(K)-.K) 

cnp 

CX-,B 

iSINGLEf 

JNE 

A-DOUBLEB 


FLD 

DWORD PTR [DI] 

aOAD A(INDEX(K)nK) 
lAND LEAVE IT ON 
STACK 

JMP 

TEST-FOR-ZERO 


A_DOUBLEB: FLD 

(2WORD PTR [DI] 
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TEST_F0R_ZER0: 

FTST 

FSTSlil STATUS-lilORD 

FlilAIT 

MOV AHiBYTE PTR STATUS_liiORI>-«-l 

SAHF 


JC 

NOT-.SINGULAR 

;JUMP IF C3=Q 

JNZ 

NOT-SINGULAR 

;0R IF CD=0 

^SINGULAR MATRIX 



MOV 

BX-.[BP]+fl 

•iBX=ADDR(IER) 

MOV 

lilORD PTR [BX]-.-! 

;IER=-1 

JMP 

DONE 


NOT^SINGULAR: 



iFILL IN ROW OF U 



;move j through K+1 

TO N-1 


MOV 

AX-.[BP]+M 

iAX=N 

SUB 

AXiSI 

•.AX=N-K 

MOV 

DI-.[BP]+1D 

•nDI=ADDR(INDEX) 

ADD 

DI-.SI 


ADD 

DIiSI 

iDI = ADDR(INDEX(K)) 

MOV 

DIn[DIl 

nDI=INDEX(K) 

SHL 

DIiCL 

;di=type*index(K) 

ADD 

DI-.[BP1+1E 

tDI = ADDR(A(INDEX)K) 

MOV 

AXiSI 

iAX=K 

MUL 

WORD PTR [BP]+M 

^AX=N*K 

SHL 

AXiCL 

iAX=TYPE*N5|cK 

MOV 

DX-.[BP]+^ 

nDX = N 

SUB 

DX-.SI 

•.DX=N-K 

MOV 

COUNT-.DX 


U_LOOP: DEC 

COUNT 

•iCOUNT = COUNT-l 

CMP 

COUNT-.D 


JLE 

END_U_LOOP 


MOV 

BX-.[BP]+M 

nBX = N 

SHL 

BX-.CL 

•iBX=TYPE*N 

ADD 

X 

m 

r 

X 

UX = TYPE*N*J 

\ 

^CALL XINP(A(INDEX(K)i 

□)-,A(D-,J)-.INDEX-.TYPE 

iN-.K) 

PUSH 

DI 

;addr(A(INDex(K)-.0)) 

MOV 

BX-,[BP]+ia 

tBX = ADDR(A(DiD) 

ADD 

BXiAX 


PUSH 

BX 

UDDR A(D-.J) 

PUSH 

[BPl+lD 

lADDR INDEX 

PUSH 

[BP]+b 

nTYPE 

PUSH 

[BP]+M 

nN 

PUSH 

SI 


CALL 

XINP 



^CALCULATE (A(INDEX(IC)-.J)-SUM)/A(INDEX(K)-.K) 
-1 FOR SINGLE OR DOUBLE PRECISION 
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tNote that sun is i 

UND A(INDEX(K)-.K) IN 

nov 

ADD 

cnp 

JNE 
FSUBR 
FDIV 
FSTP 

jnp 

A-D0UBLE3: 

FSUBR 
FDIV 
FSTP 

NEXT-ROlil-ELEMENT: 

JNP 

END-U_LOOP: 

FSTP 

^READY FOR NEXT K 
INC 

cnp 

JGE 

jnp 

DONE: 

POP 
POP 
POP 
POP 
POP 
POP 
POP 
RET 

CROUTP ENDP 


I ST 
ST(1) 

BXiDI 

BX.AX 

CX-,2 

A_D0UBLE3 
DWORD PTR [BX] 
STiST(l) 

DWORD PTR [BX] 
NEXT_ROW_ELEnENT 


(2WORD PTR [BX] 
ST-iSTd) 

(3WORD PTR [BX] 


U_LOOP 


ST(0) 

SI 

SI-.WORD PTR [BP]+M 
DONE 

nAJOR-LOOP 


DI 

SI 

DX 

CX 

BX 

AX 

BP 

ID 


^BX = ADDR(A(INDEX(K}nD)) 
nBX=ADDR(A(INDEX(K:)-. J)) 
^SINGLEf 

iST=A(INDEX(K)-.J]-SUn 
iST = ST/A(INDEX(K)-.K) 
U(INDEX(I)-.K)=ST 


TST = A(INDEX(K)-.J)-SUn 
nST = ST/A(INDEX(K)-,K) 
^A(INDEX(I).K)=ST 


^CLEAR ST 


iK = Nf 


All we need now is a procedure to call CROUTP from BASIC. We'll 
call this procedure REDUCE. Procedure REDUCE is called by 

CALL REDUCE(A(D-.0)-.INDEX(0)-.TYPE-.IER-.N) 

where A is the N by N matrix of coefficients. INDEX is an integer array 
returning the row permutations. TYPE, lER, and N are integers. TYPE 
indicates whether A is single or double precision. lER returns 0 if the 
matrix is nonsingular and -1 if the matrix is singular. REDUCE replaces 
A with its Crout reduction with partial pivoting. 

•.SUBROUTINE REDUCE(A(DINDEX(D).TYPE.lER.N] 

; ASSUnPTIONS: A(0-,0)-.INDEX(D)-.IER ARE ADDRESSES TO BE PASSED 
i TO CROUTP 



766 8087 Applications and Programming 


TYPE 

•.N 

ARE ADDRESSES OF INTEGERS WHOSE 

n 

VALUES SHOULD BE PASSED TO CROUTP 

T 

; REDUCE 

CALLS THE 

INTERNAL SUBROUTINE CROUTP 


PUBLIC 


REDUCE 


ASSUME 


CS:CSEGiES:ESEG 

REDUCE 

PROC 


FAR 


PUSH 


BP 


MOV 


BPiSP 

iSET UP 

STACK AREA 

IN 

ESEG 


PUSH 


ES 


CALL 


NEXT 

NEXT: 

POP 


AX 


SUB 


AX.(OFFSET NEXT)-(OFFSET FIRST-! 


MOV 


CL.M 


SHR 


AX.CL 


MOV 


BX.CS 


ADD 


BX.ESEG 


SUB 


BX.CSEG 


ADD 


AX.BX 


MOV 


m 

X 

T 

MOV 


LOCAL-SPACE.SS 


MOV . 


LOCAL-SPACE+a.SP 


MOV 


AX.ES 


MOV 


SS.AX 


MOV 


SP.OFFSET STACK-TOP 

T 

iSET UP 

CALL PARAMETERS 


PUSH 


DS:[BP]+m UDDR(A{0.a)) 


PUSH 


DS:[BP]+ia ;ADDR(INDEX) 


PUSH 


DS:[BP]+a :;ADDR(IER) 


MOV 


BX.DS:[BP1+1D ^ BX=ADDR(TYPE) 


PUSH 


[BX] ^TYPE 


MOV 


BX.DS:[BP]+b ^BX=ADDR(N) 


PUSH 


[BX] '.N 


CALL 


CROUTP 


MOV 


SP.LOCAL-SPACE+S 


MOV 


SS.LOCAL-SPACE 


POP 


ES 


POP 


BP 


RET 


ID 

REDUCE 

ENDP 




Back Substitution After a Crout Reduction 

REDUCE leaves the LU decomposition in A and the order of row per¬ 
mutation in INDEX. Temporarily setting aside the question of INDEXing, 
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we face a computationally straightforward problem of solving y = LUx 
for X. We do this in two steps. First, solve y = Ly* for y*. Second, solve 
y* = Ux for X. 

Examination of the triangular matrices above shows the simple form 
for solving triangular systems of equations. For a lower triangular system, 

yo = Lo,oy*o 

Yi = Li,oy’'o + Li,iy^ 

72 = L2,oy’'o + L2;iy*i + L2,2y*2 
and so forth. 

Turning these equations around we can solve directly for y*. 
y*o = yo/Lo/o 
y’^i = (yi - Li,oy*o)/Li,i 
Y*2 = (72 - (L2,oy*o + L2,iy*l))/L2,2 
and so forth. 

For an upper triangular system we have: 

y*n-l = Un_i,n_iXn_i 

y*n-2 = Un-2,n-2Xn-2 + Un-2,n-lXn-l 

y’^n-3 = Un-3,n-3Xn-3 + Un-3,n-2Xn-2 + Un-3,n-lXn-l 

and so forth. 

As we turn these equations around to solve for x, remember that Uj,, 
equals 1 after the Crout reduction. 

Xn-l = y*n-l 


Xn-2 = (y*n-2 “ U„_2,n-lXn-l) 

Xn —3 (y n —3 (Un —3,n —2Xn —2 b^n —3,n —iXn —l)) 

and so forth. 

The following BASIC code takes a Crout reduced matrix A and a column 
vector X and solves for X. 

10 DEFINT I-N 

SO DEFDBL S 

30 DIM A(N-l-,N-l)-iYSTAR(N-l)-.Y(N-l)-,X(N-l) 

40 REM SOLVE LOWER TRIANGULAR SYSTEI1 FOR YSTAR 

SO FOR 1=0 TO N-1 

Lo sun=o 

70 FOR J=0 TO I-l 

ao SUI1=SUI1+A(I-.J)*YSTAR(J) 

■10 NEXT J 

100 YSTAR(I)=(Y{I)-SUri)/A(I-,I) 
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110 NEXT I 

ISO REM SOLVE UPPER TRIANGULAR SYSTEM FOR X 

130 FOR I=N-1 TO 0 STEP -1 

mo SUM=0 

ISO FOR J=I+1 TO N-1 

lt.0 SUM=SUM + A(I-, J)*X(J) 

170 NEXT J 

IflO X(I)=YSTAR(I)-SUM 

no NEXT I 

Notice that solving the lower triangular and upper triangular system 
are both order n^ operations. Once order n^ operations have been per¬ 
formed to reduce A, each new y can be solved for x at the expense of 
only order n^ additional operations. Notice that lines 60-90 and 140-170 
form inner products. We could use GINPROD here. 

REDUCE performs row permutations in the process of generating a 
triangular form. Our next set of BASIC code ''undoes" the INDEXing 
and also makes explicit use of GINPROD. 

SSDO REM PROGRAM SOLP 
2bQD ITYPE=M 
27DD 11=1 

2flD0 REM USE X FOR SCRATCH SPACEi RATHER THAN 
2^00 REM SOLVE LOWER TRIANGULAR SYSTEM FOR YSTAR 
3DDD FOR 1=0 TO N-1 

3100 NUM=I 

3200 CALL GINPR0I>(A(INDEX(I)-,D).X(D)-,SUM-.ITYPE-.ITYPE-,N-.I1-, 

NUM) 

3300 X(I)=(Y(INDEX(I))-SUM)/A(IN])EX{I)-,I) 

3M0D NEXT I 

3500 REM SOLVE UPPER TRIANGULAR SYSTEM FOR X 
3bDD FOR I=N-1 TO 0 STEP -1 

3700 IP=I+1 

3flDD NUM=N-IP 

3^00 REM CALL GINPROI>(A(INI)EX(I)-.IP)-.X(IP)nSUM-. 

ITYPE-.ITYPE-,N-.IliNUM) 

MOOD X(I)=X(I)-SUM 
moo NEXT I 

The BASIC program is easily recoded into an 8087 NEAR procedure, 
SOL, which can be called from BASIC by the external procedure SOLVE. 

1 SUBROUTINE S0L(A-.Y.X-,INDEX-.TYPEAiTYPEYiTYPEXiN) 
i ASSUMPTIONS: A‘iYXnINDEX ARE ADDRESSES 
n TYPEAiTYPEY-iTYPEX-iN ARE INTEGERS 

n NOTE THIS PROCEDURE CANNOT BE CALLED FROM 

BASIC 

IT FINDS ITS ARGUMENTS ON THE STACK 
i NOT THEIR ADDRESSES 

n 

1 SOL SOLVES Y=AX FOR X-. WHERE A AND INDEX 

RESULT 
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FROn A CROUT DECOMPOSITION WITH PARTIAL 
PIVOTING 


ASSUME 

CS:CSEG 



SOL PROC 

NEAR 



PUSH 

BP 



MOV 

BP.SP 



PUSH 

AX 



PUSH 

BX 



PUSH 

CX 



PUSH 

DX 



PUSH 

SI 



PUSH 

DI 



nTAKE CARE OF LOUER 

TRIANGLE 



•.FOR 1=0 TO N-1 




MOV 

CX.[BP1+M 


•.CX=N 

CMP 

CX.Q 



JG 

AROUND 



JMP 

DONE 



AROUND: MOV 

SI.D 


^KEEP I IN SI 

MOV 

DI.[BP]+1E 


^DI=ADDR(INDEX(D)) 

L_LOOP: 




^CALL GINP(A(INDEX(I)-. 

□).X(D).TYPEA.TYPEX. 

N.l.I) 

MOV 

AX.[DI] 


UX = INDEX(I) 

MUL 

WORD PTR 

[BPl+10 

UX=TYPEA*INDEX(I) 

ADD 

AX.[BP]+lfl 


^AX=ADDR(A(INDEX(I).D)) 

PUSH 

AX 



PUSH 

[BP]+m 


^X(D) 

PUSH 

[BPJ+ID 


•.TYPEA 

PUSH 

[BP]+t, 


aYPEX 

PUSH 

[BP]+M 


^N 

MOV 

BX.l 



PUSH 

BX 


-.1 

PUSH 

SI 



CALL 

GINP 



:iSUM IS NOlil IN ST 




iX(I)=(Y(INDEX(I))-SUn)/A(INDEX(I)-.I) 


MOV 

BX.[DI] 


^BX = INDEX{I) 

SHL 

BX.l 



SHL 

BX.l 


^BX=M*INDEX(I) 

CMP 

WORD PTR 

[BPl+fl•,^ 

as Y SINGLEf 

JNE 

Y_DOUBLE 



ADD 

BX.[BP]+lfci 


^BX=ADDR(Y(INDEX(I))) 

FSUBR 

DWORD PTR 

[BX] 

•.ST=Y(INDEX(I))-SUn 

JMP 

DO_DIV 



Y_DOUBLE: SHL 

BX.l 


^BX=a*INDEX(I) 

ADD 

BX.[BP]+lb 


^BX = ADDR{Y(INDEX(I))) 

FSUBR 

flUO^RD PTR 

[BX] 

aT=Y(INDEX(I))-SUn 

DO-DIV: 




•.AX HAS ADDR(A(INDEX(I).0)). SO ADD TYPEA*N*I 

MOV 

BX.AX 


•.BX=ADDR(A(INDEX(I).D)) 

MOV 

AX.SI 


•.AX=I 
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nuL 

WORD PTR [BP]+M 

nAX=N*I 

SHL 

AXnl 


SHL 

r 

X 

<c 

•.AX = M*N*I 

CMP 

WORD PTR [BP)+1Q.M 

as A SINGLEf 

JNE 

A_DOUBLE 


ADD 

BXnAX 

^BX=ADDR(A(INDEX(I)-.I]) 

FDIV 

DWORD PTR [BX] 

i(Y(I)-SUd)/A(INDEX(I)-iI) 

JMP 

DO-STORE 


A_D0UBLE: SHL 

AXil 

iAX=fl*N*I 

ADD 

BX.AX 

:.BX=ADDR(A(INDEX(I)nI)) 

FDIV 

fllilORD PTR [BX] 

n(Y(I]-SUd)/A(INDEX(I)-.I) 

DO-STORE: MOV 

BXiSI 

CD 

X 

II 

M 

SHL 

BXnl 


SHL 

BX-.1 

;bx=m*i 

CMP 

WORD PTR [BP]+t-,M 

as X SINGLEf 

JNE 

X-DOUBLE 


ADD 

BXi[BP]+m 

nBX=ADDR(X(I)] 

FSTP 

DWORD PTR [BX] 

iSTORE X(I) 

JMP 

L-BOTTOd 


X-DOUBLE: SHL 

BX-.1 

iBX=a*I 

ADD 

BX-.[BP]+m 

nBX=ADDR(X(I)) 

FSTP 

(3W0RD PTR [BX] 

.STORE X(I) 

L-BOTTOM: 

INC 

SI 

a=i+i 

ADD 

DI-.2 

^NEXT INDEX 

LOOP 

GOTO-L-LOOP 


JdP 

U-TOP 


GOTO-L_LOOP: JdP L_ 

LOOP 


1 

U_TOP: 

nTAKE CARE OF UPPER 

TRIANGLE 


^FOR I=N-1 TO □ STEP -1 


dOV 

SIi[BP]+4 

•.KEEP I IN SI 

dOV 

DI-.[BP]+1B 

•.DI = ADDR(INDEX(0)) 

ADD 

DI-.SI 


ADD 

DI.SI 

ai=ADDR(INDEX(N)) 

U-LOOP: 

^CALL GINP(A(INDEX(I) 

-.I + l)-iX(I+l)-.TYPEA-,TYPEX*,N-,liN-I-l) 

DEC 

SI 

a=i-i 

SUB 

DI-.2 

^NEXT INDEX 

dOV 

AX-iSI 

ax=i 

INC 

AX 

UX=I+1 

dUL 

WORD PTR [BPl+^ 

ax=N*(i+i) 

ADD 

AX-,[DI] 

aX=INDEX(I)+N*(I+l) 

SHL 

AX-.1 


SHL 

AX.l 

•.AX=^*AX 

CdP 

WORD PTR [BP]+lQ-iM 

as A SINGLEf 

JE 

A-SINGLE 


SHL 

AXiI 

ax=fl*(. . .) 

A-SINGLE: ADD 

AX-.[BP]+lfl 

UX = ADDR(A(INDEX(I). 


I+D) 
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PUSH 

AX 


MOV 

BXnSI 

CD 

X 

11 

M 

INC 

BX 

iBX = I+l 

SHL 

BX-.1 


SHL 

BX.l 

nBX = M*(I + l) 

CMP 

WORD PTR [BP]+b-.M 

as X SINGLED 

JE 

X_SINGLE 


SHL 

BXnl 

iBX=fl*(I+l} 

X_SINGLE: ADD 

BX-,[BP]+m 

iBX=ADDR(X(I + l)) 

PUSH 

BX 

nNOTE: LEAVE ADDR IN 

PUSH 

[BP]+1D 

BX 

aVPEA 

PUSH 

[BP]+b 

aVPEX 

PUSH 

[BP]+M 

;n 

MOV 

DXnl 


PUSH 

DX 

a 

MOV 

DX.[BP]+M 

nDX = N 

SUB 

DX-.SI 


DEC 

DX 

ax=N-(i+i) 

PUSH 

DX 


CALL 

GINP 


1 

TX(i)=x(i)-sun 

^NOTE BX STILL POINTS TO X(I+1) 


SUB 

BX-i[BP]+L 

iBX = ADDR(X(I)) 

CMP 

WORD PTR [BP]+b-.M 

nONCE AGAIN-. IS X 

JNE 

X-DOUBLEB 

SINGLEf 

FSUBR 

DUORD PTR [BX] 

aT=X(I)-SUM 

FSTP 

DWORD PTR [BX] 

aTORE X(I) 

JMP 

U_BOTTOM 


X_DOUBLES: FSUBR 

(SlilORD PTR [BX] 

aT=X(I)-SUM 

FSTP 

(SWORD PTR [BX] 

iSTORE X(I) 

U_BOTTOM: 

CMP 

SI-.D 

aONE YETf 

JLE 

DONE 


JMP 

U-LOOP 


T 

DONE: 

POP 

DI 


POP 

SI 


POP 

DX 


POP 

CX 


POP 

BX 


POP 

AX 


POP 

BP 


FlilAIT 

RET 

IL 


SOL ENDP 
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iSUBROUTINE S0LVE(A(0i0)-.Y(0)-,X(0)-,INDEX(a)-.TYPEA-,TYPEY-,TYPEX-.N) 
i ASSUMPTIONS: A(0iD)iY(0)-.X(0)-.INDEX(D) ARE ADDRESSES 
i TO BE PASSED TO XINP 

i TYPEAiTYPEYiTYPEXiN ARE ADDRESSES OF INTEGERS 

UHOSE VALUES SHOULD BE PASSED TO XINP 

i SOLVE CALLS THE INTERNAL SUBROUTINE SOL 

PUBLIC SOLVE 

ASSUME CS:CSEGnES:ESEG 

SOLVE PROC FAR 

PUSH BP 

MOV BP-.SP 

nSET UP STACK AREA IN ESEG 
PUSH ES 

CALL NEXT 

NEXT: POP AX 

SUB AX-.(OFFSET NEXT)-(OFFSET FIRST_INST) 

MOV CL-.M 

SHR AX-.CL 

MOV BXnCS 

ADD BX-.ESEG 

SUB BX-.CSEG 

ADD AX-.BX 

MOV ESnAX 

MOV LOCAL-SPACE-iSS 

MOV LOCAL-SPACE+BtSP 

MOV AX.ES 

MOV SSnAX 

MOV SPiOFFSET STACK-TOP 


;SET UP CALL PARAMETERS 


PUSH 

DS:[BP]+2D 

nADDR(A(QtD)) 

PUSH 

DS:[BP]+lfl 

UDDR(Y[0)) 

PUSH 

DS:[BP]+lb 

iADDR(X(D)) 

PUSH 

DS:[BP]+m 

iADDR(INDEX(D)) 

MOV 

BX-.DS:[BP]+12 

:iBX=ADDR(TYPEA) 

PUSH 

[BX] 

•nTYPEA 

MOV 

BX-.DS:[BP]+1D 

;BX=ADDR(TYPEY) 

PUSH 

[BX] 

^TYPEY 

MOV 

BX-,DS:[BP]+fl 

•iBX=ADDR(TYPEX) 

PUSH 

[BX] 

iTYPEX 

MOV 

BX-,DS:[BP]+b 

iBX = ADDR(N) 

PUSH 

[BX] 

iN 

CALL 

SOL 


MOV 

SP-,L0CAL_SPACE + 2 


MOV 

SSnLOCAL-SPACE 


POP 

ES 
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POP BP 

RET It 

SOLVE ENDP 


Matrix Inversion 

One good reason for creating procedures from modular programs is the 
ease with which the subroutines may be rearranged. It is now quite easy 
to prepare a matrix inversion subroutine. Since matrix inversion is equiv¬ 
alent to solving a system of equations n times (first for a "y vector” 1,0,0, 
. . ., then for 0,1,0,0 . . ., and so forth.), we can use CROUTP and SOL 
to create a new subroutine INV. 

INV is called by 

CALL INV(A(a-,D)-,AINV(D-.D)iSCRATCH(0)-iINDEX(D)-,IERiTYPEA-,N) 

A initially contains the n by n matrix to be inverted. When INV returns, 
A contains the Grout decomposition, with permutation index in INDEX; 
AINV contains A“^. lER equals 0 if A is nonsingular and -1 otherwise. 
INV calls CROUTP to reduce A. It then sets up a ”y column,” in vector 


SCRATCH, 

n times, and calls SOL to fill in the columns of the inverse 

matrix, AINV. 



^SUBROUTINE INV(A(D-.D) 

-.AINV{0-.0)-,SCRATCH(0)-.INDEX(D)-.IERiTYPEA-.N) 

; ASSUMPTIONS: A-.AINV 

ARE N BY N MATRICES OF 

TYPE TYPEA 


SCRATCH IS SINGLE PRECISION 


1 

INDEX-. 

TYPEA-.N ARE INTEGERS 


*1 

i INV INVERTS A INTO 

AINV 



PUBLIC 

INV 



ASSUME 

CS:CSEG-.ES:ESEG 


INV 

PROC 

FAR 



PUSH 

BP 



MOV 

BP-.SP 


iSET UP 

STACK AREA IN ESEG 



PUSH 

ES 



CALL 

NEXT 


NEXT: 

POP 

AX 



SUB 

AX-.(0FFSET NEXT)-(0FFSET 

FIRST-INST) 


MOV 

CL-.M 



SHR 

AX.CL 



MOV 

BX.CS 



ADD 

BX-.ESEG 



SUB 

BX-.CSEG 



ADD 

AX-.BX 



MOV 

ES.AX 



MOV 

LOCAL-SPACE.SS 



MOV 

LOCAL-SPACE+a.SP 
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nov 

AX-iES 


MOV 

SS-iAX 


MOV 

SP-.OFFSET STACK-TOP 

nov 

BX-.DS:[BP]+t, 

iBX=ADDR(N) 

nov 

CXi[BX] 

nCX=N 

cnp 

CX-.0 



ARND 


jnp 

DONE 


ARNO: 



nCALL CROUTP(A-.INI>EX 

-.lERiTYPE-iN) 


iSET UP CALL PARAMETERS 


PUSH 

DS:[BP]+ia 

nADDR(A(0-,0)) 

PUSH 

DS:[BP]+ia 

•nADDR(INDEX(D)] 

PUSH 

DS:[BP]+1D 

nADDR(IER) 

nov 

BX-.DS:[BP]+fl 

iBX=ADDR(TYPEA] 

PUSH 

[BX] 

•.TYPEA 

PUSH 

CX 

iN 

CALL 

CROUTP 


-.WAS IT SINGULAR 



nov 

BX.DS:[BP]+10 

iBX=ADDR(IER) 

cnp 

UORD PTR [BX]-,D 

iIER=Df 

JE 

ARNDB 


jnp 

DONE 


ARNDB: 



tSOLVE for columns 

1=0 TO N-1 


nov 

SIiQ 

^KEEP I IN SI 

INV_LOOP: 



nCLEAR SCRATCH 



MOV 

BX-.DS:[BP]+b 


MOV 

CX-,[BX] 

iCX=N 

MOV 

BX-.DS:[BP]+m 

nBX = ADDR(SCRATCH(D)) 

ZERO-LOOP: 



MOV 

WORD PTR [BX1-.D 


MOV 

UORD PTR [BX]+a 

nO ;SCRATCH(J)=0 

ADO 

BXnM 

iJ=J+l 

LOOP 

ZERO-LOOP 


iNOU FILL IN APPROPRIATE 1 


MOV 

BX-iSI 

tBX=I 

SHL 

BX-,1 


SHL 

BXnl 

^BX=^*I 

ADO 

BXtUORD PTRDS: 

[BP]+m 



tBX = ADDR(SCRATCH(I)) 

FLDl 


nPUSH 1.0 ONTO STACK 

FSTP 

DUORD PTR [BX] 

nSTORE INTO MEMORY 

tGET address of AINV(D-iI) 


nov 

BXiDS:[BP]+t, 

=iBX=ADDR(N) 

nov 

AX-.[BX] 

•.AX=N 

nuL 

SI 

iAX=N*I 
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DONE: 


INV 


nov 

BX->I>S:[BP]+a 

UDDR(TYPEA) 

MUL 

lilORD PTR [BX] 

UX=TYPEA*N*I 

ADD 

AX-.l>S:[BP]+lb 

iAX=ADDR(AINV(D-. 

iiSCRATCHn 

AINV(0-.1)1 INDEX-. TYPEA-,M-,TYPEA*.N) 

PUSH 

DS:[BP]+lfl 

nADDR(A(D-,D)) 

PUSH 

DS:[BP]+m 

iADDR(SCRATCH(0)) 

PUSH 

AX 

UDDR(AINV(0iI)) 

PUSH 

DS:[BP]+ia 

UDDR(INDEX(D)) 

nov 

BX-.DS:[BP]+fl 

•.BX=ADDR(TYPEA) 

PUSH 

[BX] 

iTYPEA 

nov 

AX-.M 


PUSH 

AX 


PUSH 

[BX] 

aYPEA 

nov 

BX-.DS:[BP]+fci 


PUSH 

[BX] 

iN 

CALL 

SOL 


INC 

SI 

il = l+l 

nov 

BX-.DS:[BP]+b 

iBX=ADDR(N) 

cnp 

SI-.[BX] 

TSI>Nf 

JGE 

DONE 


jnp 

INV-LOOP 


nov 

SP-.LOCAL_SPACE + a 


nov 

SS-.L0CAL_SPACE 


POP 

ES 


POP 

BP 


RET 

IM 


ENDP 




Of Linear Things Not Covered Above 

Two chapters and hundreds of lines of code will have to suffice as an 
introduction to matrix methods. Before moving on, it's worth listing a 
few of the things not covered here. This is also a good point to pause 
for a review of some general themes in programming the 8087. 

The routines in these two chapters will do just about every ordinary 
thing you usually need to do with a matrix. However, if you have really 
large problems, you may soon develop an interest in extraordinary pro¬ 
cedures. Everything you need to know about the 8087 is included here, 
but there are many sophisticated algorithms that we haven't even touched 
upon. These algorithms appear in many excellent books on numerical 
computation. Two exceptional books are: 

Elementary Numerical Analysis, by S. D. Conte, McGraw-Hill. 

Introduction to Matrix Computations, by G. W. Stewart, Academic Press. 
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One of the best ways to learn more about the "tricks" of numerical 
programming is to browse through the documentation of a large nu¬ 
merical programming subroutine library. Such a "library collection" can 
usually be found at your local college computer center. The IMSL library 
is particularly good. You will find an excellent applied discussion of 
numerical methods and the IMSL library in (this book is moderately 
advanced): 

Numerical Methods, Software, and Analysis, by John R. Rice, McGraw- 
Hill. 

The procedures in this and the last chapter provide the basic foundation 
for matrix programming. Some advanced topic areas not covered here 
include: 

• Matrices with a lot of zeros. Problems in the natural sciences often give 
rise to special forms of matrices. For example, if we know a matrix 
is triangular, we can avoid processing the zeros. That would double 
the speed of matrix multiplication. As we've seen, taking advantage 
of the shape of a triangular matrix can improve the speed of matrix 
inversion or solving a system of equations by a factor of n. Other 
special forms include the "diagonal matrix," in which all off-diagonal 
elements are zero; the "band matrix," which is zero except for ele¬ 
ments close to the diagonal; and the general designation of a "sparse 
matrix." Sparse matrices, which often arise from solving systems of 
differential equations, may be 99 percent zeros. Special storage tech¬ 
niques, in which only the nonzero elements are stored, must be used 
to work with sparse matrices. 

• Matrices with special mathematical properties. Sometimes the mathe¬ 
matics of a problem supply special information about the structure 
of the numbers stored in a matrix. For example, many problems give 
rise to a symmetric matrix, one in which Aj j equals Aj^j. You can 
double the speed of many matrix operations by taking advantage of 
symmetry. S 5 mnmetric matrices are especially common in statistical 
work. 

• Super-high accuracy methods. One of the lessons of numerical pro¬ 
gramming is that mathematically correct procedures don't always 
give the mathematically correct answer when executed on a com¬ 
puter with finite precision. One place this lesson is often learned is 
in the reduction of a matrix to triangular form. We picked the Grout 
reduction with partial pivoting because it is particularly well suited 
to the 8087's high-precision, temporary real format internal registers. 
Nonetheless, you may eventually want to learn about other methods, 
including iterative techniques, which are much slower, but which 
can be much more accurate. 
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8087 Matrix Program Review 

1. Expect assembly language programs to be 10 times as long as their 
BASIC counterparts. Writing large programs in assembler is time 
consuming and error prone. In fact, the expense in programming 
time may be prohibitive. (Even if the time is "free," because it's 
your own.) 

2. 8087 assembly language programs may be 100, or more, times faster 
than BASIC programs. In fact, when attempting large problems in 
BASIC, the expense in computer time may be prohibitive. (Even if 
the time is "free," because the computer is already paid for.) 

3. Optimize the inner-most loop. Worry about optimizing, for speed and 
for accuracy, the equivalent of the inner-most FOR/NEXT loop. For 
matrix operations, this usually means having an inner product rou¬ 
tine carefully hand-coded for the 8087. We chose Crout decompo¬ 
sition over Gaussian elimination for two reasons. First, the inner 
product specification allowed accumulation in a high-precision reg¬ 
ister even if the overall operation is only single precision. Second, 
this specification allows us, if we wish, to code just the inner product 
routine in assembler and leave the shell of the program in BASIC. 

4. Never invert a matrix when you really need only solve a system of 
equations. Reducing a matrix is an order n^ operation. Inverting a 
reduced matrix requires an additional order n^ operations, while 
solving a system of equations only requires an additional order n^ 
operations. A series of solutions is best obtained with one call to 
REDUCE and several calls to SOLVE, not one call of INV and several 
MATMULTs. The principle exception to this rule occurs when the 
inverse matrix itself has an important interpretation, as it frequently 
does in statistical applications. 

5. The 8088 can do most bookkeeping faster than the 8087 can do 
floating point arithmetic, so most 8088 operations run in parallel 
with the 8087's speed as the limiting factor. An exception is the 
integer multiply used in addressing matrix elements. It pays to keep 
integer multiplication out of inner-most loops. Sometimes multi¬ 
plication can be avoided by adding to a location counter at each 
loop. At other times, a "left shift" can be substituted for each mul- 
tiply-by-2. (Not coincidentally, Intel made the multiply instruction 
on its newer processors, the 188 and 186, three times as fast as on 
the original 8088 and 8086.) 

6. Counter testing can be done at either the top or the bottom of a 
loop. The choice is largely a matter of style. (Loops which use the 
8088 LOOP instruction test more naturally at the bottom.) Some of 
the programs in the last two chapters test at the top and some test 
at the bottom, so that you can see both methods. Ordinarily, it's 
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good programming practice to choose one style or the other and 
stick to it. 

7. Subroutine calls, subroutine relocation code, and a few other in¬ 
struction sequences are very repetitive. If you use the MACRO 
assembler, you might want to replace these sequences with macros. 

8. To conserve storage, the routines here reduce a matrix "in place." 
If you need to save the original matrix, make a copy first using 
GCOPY from Chapter 9. 

Onward, Non-linearly 

We set aside matrix operations here, and move on to non-linear oper¬ 
ations in Chapter 12. If you'd like some practical applications of our matrix 
routines, skip ahead to the discussion of statistical computing in Chapter 
14. 




Advanced Instruction 
Set 


In this chapter, we pick up and complete the task laid aside at the end 
of Chapter 6, our description of the 8087 instruction set. Describing the 
use of the most advanced instructions is rather long and technical; on a 
first reading you may want to proceed directly to the next chapter. 


The Cookbook—Chapter 12 


Program: 

LN 

Purpose: 

Natural logarithm (base e). 

Input: 

8087 register ST; requires ST>0. 

Output: 

8087 register ST; new ST = log(old ST). 

Language: 

8087/8088 assembly language. 

Note: 

NEAR procedure. 

Program: 

LOGIC 

Purpose: 

Common logarithm (base 10). 

Input: 

8087 register ST; requires ST>0. 

Output: 

8087 register ST; new ST = logio(old ST). 

Language: 

8087/8088 assembly language. 

Note: 

NEAR procedure. 

Program: 

TW02THEZ 

Purpose: 

Raises 2 to the power Z. 

Input: 

Z in 8087 register ST. 

Output: 

8087 register ST; new ST = 

Language: 

8087/8088 assembly language. 

Note: 

NEAR procedure. 

Program: 

EXP 

Purpose: 

Raises e to the power X. 

Input: 

X in 8087 register ST. 

179 
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Output: 

8087 register ST; new ST = 

Language: 

8087/8088 assembly language. 

Note: 

NEAR procedure. 

Program: 

TEN2THEX 

Purpose: 

Raises 10 to the power X. 

Input: 

X in 8087 register ST. 

Output: 

8087 register ST; new ST = 

Language: 

8087/8088 assembly language. 

Note: 

NEAR procedure. 

Program: 

Y2THEX 

Purpose: 

Raises Y to the power X. 

Input: 

X in 8087 register ST. 

Y in 8087 register ST(1). 

Output: 

8087 register ST; new ST = (old ST)(°i‘’ st( 1 )) 

Language: 

8087/8088 assembly language. 

Note: 

NEAR procedure. 

Program: 

TANGENT 

Purpose: 

Compute tangent. 

Input: 

8087 register ST (angle in radians). 

Output: 

8087 register ST; new ST = tan(old ST). 

Language: 

8087/8088 assembly language. 

Note: 

NEAR procedure. 

Program: 

SINE 

Purpose: 

Compute sine. 

Input: 

8087 register ST (angle in radians). 

Output: 

8087 register ST; new ST = sin(old ST). 

Language: 

8087/8088 assembly language. 

Note: 

NEAR procedure. 

Program: 

COSINE 

Purpose: 

Computer cosine 

Input: 

8087 register ST (angle in radians) 

Output: 

8087 register; new ST=cos (old ST). 

Language: 

8087/8088 assembly language. 

Note: 

NEAR procedure. 

Program: 

ARCTAN 

Purpose: 

Compute arctangent. 

Input: 

8087 register ST. 

Output: 

8087 register ST; new ST = arctan(old ST). 

Language: 

8087/8088 assembly language. 

Note: 

NEAR procedure. 


This chapter is divided into four sections. The first two sections finish 
describing the arithmetic and constant instructions. The last two sections 
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present the transcendental and processor control instructions. A number of 
examples are included for the more intricate operations. 


Arithmetic instructions 

Four arithmetic instructions remain to be discussed. 

FRDINT {ST} 9 microseconds 

FRDINT (round to integer) rounds the element on top of the 8087 stack 
to an integer. (The number continues to be represented as a temporary 
real; after an FRDINT the temporary real number has an integer value.) 
The 8087 offers four rounding modes: round to nearest, round down, round 
up, and chop (round toward zero). Round to nearest is the default mode. 

FSCALE {ST,ST(1)} 7 microseconds 

FSCALE (scale by powers of two) adds the value found in ST(1) to the 
exponent of ST. This effectively multiplies the top of stack element by 2 
to the power contained in ST(1). Since the exponent field is an integer, 
the value in ST(1) should be an integer as well. If ST(1) is not an integer, 
the value is rounded toward zero before being added to the exponent in 
ST. The scale factor in ST(1) must be between -32768 and 32768 (2^®). 
If the scale factor is out of range or a non-integer value between -1 and 
-I- 1, the result is undefined. For safety, load ST(1) from a word integer. 
Notice that FSCALE provides an extremely fast way to multiply or divide 
numbers by a power of 2. 

FPREM {ST,ST(1)} 25 microseconds 

FPREM (partial remainder) divides the stack top by ST(1) and places the 
remainder back in the stack top. (We explain use of the name “partial" 
below.) The result is exact with no loss of precision. FPREM (in effect) 
repeatedly subtracts ST(1) from ST and leaves the remainder in ST. When 
no more subtractions can be done without getting a negative difference, 
FPREM quits. Thus, if ST initially holds X, at completion of FPREM ST 
holds X-(q X ST(1)), where q is an integer. 

FPREM will, however, only reduce the difference in magnitude be¬ 
tween ST and ST(1) by 2^. If the difference is greater than this, repeated 
executions are necessary. (The 8087 doesn't allow itself to be interrupted 
in the middle of an instruction. Some programs might want to interrupt 
the 8087 in a bit of a hurry, so FPREM was designed to work part way 
through a modular division problem at each execution.) At each step, 
the “partial remainder" is left in ST. At the end of each execution, three 
possible comparisons exist between ST and ST(1). If ST<ST(1), the re¬ 
mainder is in ST. If ST = ST(1), the remainder is 0. If ST>ST(1), then ST 
has only the partial remainder and FPREM should be repeated. FPREM 
sets bit C2 of the status word when it needs to be repeated and clears 
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the bit when it has completed. FPREM also places the least-significant 
three bits of the quotient, q, in bits CO, C3, and Cl, which is quite useful 
in analyzing periodic functions, such as sine, cosine, and tangent. For 
example, if all CO, C3, and CO equal zero, then the quotient is a multiple 
of eight. If Cl alone equals one, then the quotient is one greater than a 
multiple of eight. (Why eight? Because trigonometric calculations are 
based on dividing a circle into eight parts.) Table 12.1 describes the 
possible bit patterns. 


Table 12-1. Condition Code Bits After FPREM. 

Least Significant Bits 




of Quotient 

CO 

C3 

Cl 

0 

0 

0 

0 

1 

0 

0 

1 

2 

0 

1 

0 

3 

0 

1 

1 

4 

1 

0 

0 

5 

1 

0 

1 

6 

1 

1 

0 

7 

1 

1 

1 


The most important use of FPREM is in bringing arguments into the 
valid range for the transcendental instructions. Examples using FPREM 
are given in the section "Trigonometric Functions," below. 

FXTRACT {ST} 10 microseconds 

FXTRACT (extract exponent and significand) separates out the exponent 
and significand of the top of stack element. The exponent replaces the 
top of stack element and the significand is then pushed onto the stack. 
(Both are represented as temporary reals.) If ST originally held zero, both 
exponent and significand are zero. Note that FXTRACT is the logical 
inverse of FSCALE. 

Constant Instructions 

The 8087 has seven useful constants "hardwired in." These constants 
have full temporary real accuracy (over 19 decimal digits). Use of a con¬ 
stant instruction saves about eight microseconds and considerable nuis¬ 
ance as compared to retrieving data from memory. 

The constants are zero, one, pi, and four logarithmic values. 

FLDZ {ST} 

FLDZ (load zero) pushes 0.0 onto the stack. 


3 microseconds 
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FLD1 {ST} 4 microseconds 

FLDl (load one) pushes 1.0 onto the stack. 

FLDPI {ST} 4 microseconds 

FLDPI (load pi) pushes pi onto the stack. 

FLDL2T {ST} 4 microseconds 

FLDL2T (load log 2 10) pushes log 2 10 onto the stack. 

FLDL2E {ST} 4 microseconds 

FLDL2E (load log 2 e) pushes log 2 e onto the stack, (e is the base of the 

natural logarithms.) 

FLDLG2 {ST} 4 microseconds 

FLDLG2 (load common logarithm of 2) pushes logio 2 onto the stack. 

FLDLN2 {ST} 4 microseconds 

FLDLN2 (load natural logarithm of 2) pushes loge 2 onto the stack. 


Transcendental Instructions 

Five transcendental instructions are provided on the 8087. Two of these 
instructions are used for logarithmic calculations, one for exponentiation, 
and two for trigonometric calculations. The five instructions provide core 
calculations for a much larger set of transcendental operations. We have 
written this section in two parts. In the first part we describe the five 
instructions. In the second part we present a series of 8087 NEAR pro¬ 
cedures that can be used for the most common transcendental functions. 

The transcendental instructions require valid (normalized) arguments 
and require that the arguments be within range. Further, the transcen¬ 
dental instructions do not check their arguments. Invalid arguments may produce 
erroneous results. 

F2XM1 {ST} 100 microseconds 

F2XM1 (2 to the X, minus 1) takes the stack top as X, calculates 2^-1, 
and places the answer back in the stack top. X must be between 0 and 
V 2 , inclusive. While calculating 2^-1 instead of 2^ seems peculiar at first, 
this method allows much more accuracy when X is small. For example, 
20.000001 jg approximately 1.000000693. Subtracting one allows the 8087 to 
report, in this case, about seven extra significant digits. 

Below, we show how to use F2XM1 to calculate exponents to bases 
other then two. 
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FYL2X {ST,ST(1)} 190 microseconds 

FYL2X (Y times log 2 X) calculates Y x log 2 X, where X is in ST and Y in 
ST(1). The stack is popped, eliminating X, and the answer then replaces 
Y in the new top of stack. X must be strictly positive. 

Below, we show how to use FYL2X to calculate logarithms using bases 
other than two. 


FYL2XP1 {ST,ST(1)} 170 microseconds 

FYL2XP1 (Y times log 2 (X + 1)) takes X from the stack top, Y from ST(1), 
and calculates Y x log 2 (1+ X). X is popped and the result replaces Y on 
the new stack top. The absolute value of X must be greater than zero 
and less than SQRT(2)/2. FYL2XP1 should be used in preference to FYL2X 
when the argument is very close to one. 

FPTAN {ST} 90 microseconds 

FPTAN (partial tangent) calculates tan(theta), where theta is in the stack 
top. The argument theta is restricted to the range 0 < theta < pi/4. The 
answer is in the form of a ratio Y/X. Y replaces theta and X is pushed onto 
the stack. 

We can translate from tangent to sine and cosine by use of standard 
trigonometric identities. (See "Trigonometric Functions", below.) 

FPATAN {ST,ST(1)} 130 microseconds 

FPATAN (partial arctangent) calculates arctan (Y/X) where X is taken 
from ST and Y from ST(1). Y and X must observe the inequality 0 < Y 
< X < infinity. FPATAN pops the stack and then places the answer in 
the new stack top, replacing Y. 

FPATAN serves as a base for calculating all the inverse trigonometric 
functions. 


In the following sections, we create a number of "super instructions." 
Each "super instruction" is an 8087/8088 NEAR procedure that computes 
a common mathematical function. The procedures all assume that the 
calling routine has provided necessary scratch space and defined required 
constants. The calling routine should look something like the following. 


iCALLING ROUTINE FOR 
CSEG SEGMENT 

ASSUME 

7 blE SHOULD SAVE ANY 
MOV 

MOV 

MOV 

MOV 

CALL 

CSEG ENDS 


"SUPER-INSTRUCTIONS" 

•CODE' 

CS,:CSEG-,ES:ESEG 
REGISTERS AS REQUIRED 
AXiESEG iPOINT TO SCRATCH 

AREAS 

ES.AX 

SS-iAX 

SPiOFFSET STACK-TOP 
SUPER-INSTRUCTION 
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ESEG SEGMENT 'DATA' 


STATUS-lilORD 

D(i) 

f 

CONTROL-lilORD 

Dill 

f 

CONTROL-UORD-TEMP 

Dll) 

f 

HALF 

DD 

BFOaODDOH 

MINUSE 

DU 

-B 

SIGN-STORE 

DB 

f 

REALLY-COS 

DB 

f 

LOCAL-SPACE 

DU 

ID DUP(f) 

STACK-AREA 

DU 

SO DUP(f) 

STACK-TOP 

E(2U 

THIS UORD 


ESEG ENDS 

END 


The diskette prepared for this book includes routines to call each of the 
"super-instructions" from BASIC. 


Logarithms 


The 8087 hardware calculates logarithms for log base two. Most mathe¬ 
matical applications require natural logarithms, log base e, or common 
logarithms, log base 10. These are easily calculated using the fundamental 
identity for changing the base of a logarithm. Suppose we want the log 
of X base n, and only know how to calculate logarithms using base two. 

logn X = logn 2 X log2 X 

In this case n is e or 10. The following "super instructions" assume X 
is on the stack top, that 0 < X < infinity, and that the stack is not too 
deep to be pushed at least once more. X is replaced with its logarithm. 


^ NATURAL LOG {ST} 
^SUBROUTINE LN 
LN PROC NEAR 

FLDLNB 
FXCH 
FYLBX 

RET 

LN ENDP 


n? nicRosECONOs 


;PUSH LOG BASE E OF S 
iSUAP STnSTd) 

^POP AND REPLACE ST WITH NATURAL 
LOG 


; connoN LOG {st} n? microseconds 

•^SUBROUTINE LOGID 


LOGID 

PROC 

FLDLGS 

NEAR 

tPush log base E 

OF 

ID 


FXCH 

FYLBX 

;SUAP STnSTd) 
nPOP AND REPLACE 

ST 

UITH NATURAL 

LOGliD 

RET 

ENDP 

LOG 
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Exponentiation 

The 8087 hardware provides the instruction F2XM1 for raising 2 to the 
power X. Mathematical calculations often require e^, 10^, and y^. These 
are easily calculated using the fundamental identity for changing the base 
of an exponent. Suppose we want y^ and only Imow how to calculate 
2 \ 

yX = X log^y) 

Exponentiation routines would be simple if we had an instruction to 
raise 2 to an arbitrary power. Since F2XM1 only accepts arguments be¬ 
tween 0 and V 2 , we need a super-instruction to perform the operation 
2^ for an arbitrary Z. The 8087 instruction set is organized to make this 
a relatively easy operation, though a bit of planning is required. We 
actually have two hardware operations for taking a power of two. F2XM1 
accepts exponents between 0 and V 2 . FSCALE accepts any integer ex¬ 
ponent. We'll pick Zi and Z 2 such that Zi is an integer and Z 2 is a positive 
fraction. If Z 2 is greater than V 2 , we'll subtract V 2 from Z 2 and then multiply 
the answer by 2^. (This is all easier than it sounds.) The algorithm works 
as follows: 

1. Let Zi equal the greatest integer less than or equal to Z. This is a 
little messy since we need to round down Z. In order to accomplish 
this, we need to change the 8087 rounding control by using the load 
control word, FLDCW, and store control word, FSTCW instructions; 
instructions we don't officially meet until the next section. 

2. Let Z 2 = Z - Zi. Note that Z 2 is guaranteed to be positive. 

3. Is Z 2 > V^ 2 ? If so, subtract V 2 and make note of the fact. 

4. Raise 2 to the Z 2 and scale by Zi. 

5. If we subtracted ¥2 from Z 2 above, now multiply the result by 2'^. 

iE TO THE Z {ST} SIS MICROSECONDS 

^SUBROUTINE TIilOBTHEZ 

n 

iTHIS ROUTINE ASSUMES THAT THE FOLLOWING MEMORY LOCATIONS 
iHAVE BEEN DEFINED 
i STATUS-WORD B BYTES 
i CONTROL-WORD S BYTES 
i CONTROL-WORD-TEMP E BYTES 
^ HALF HAS Q.S IN SHORT-REAL FORMAT 

T 

M IS ASSUMED TO BE FOUND IN ST 


n THERE MUST BE 

AT LEAST B FREE STACK 

LOCATIONS 

TUOBTHEZ PROC 

NEAR 


PUSH 

AX 

iSAVE AX 

FSTCU 

CONTROL-lilORD 

^SAVE CONTROL WORD SO 
UE CAN 

•nRESTORE IT LATER 

FSTCIil 

CONTROL-UORD-TEMP 

;USE TEMP TO CHANGE 
•iROUNDING CONTROL(RC) 
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FUAIT 

AND 

CONTROL-UORD-TEMP 

■.□F3FFH iCLEAR OUT RC 



BITS 

OR 

CONTROL-UORD-TEMP 

lOOHQOH iRC=R0UND-D0UN 

FLDCU 

CONTROL-UORD-TEMP 

iSET TO ROUND DOWN 

FLD 

ST(a) 

iPUSH COPY OF Z ONTO 



ST 

FRNDINT 


;0K-, ST=Z1-. ST(1)=Z 

FLDCU 

CONTROL-UORD 

iRETURN THINGS TO NORMAL 

FSUB 

ST(1)-,ST 

;ST(1KZS)=Z-Z1 

FXCH 


ST=Z2-, ST(1)=Z1 

FLD 

HALF 

;L0AD 1/2 ONTO THE 



STACK 

FXCH 


^ST=Z2 ST(l)=l/2 

FPREM 


iST HAS Z2 OR Z2=l/E 
iCl=l IN THE LATTER 



CASE 

FSTSW 

FUAIT 

STATUS-UORD 

iNOU WE'VE GOT FLAGS 



SET 

FSTP 

ST(1) 

iGET RID OF THE 1/2 

FEXfll 

FLDl 

FADDP 

ST(1)-.ST 

^ST=(2 TO THE ST)-1 

TEST 

BYTE PTR STATUS-UORD+liOODOODlDB 



;ST HAS za IF BIT 1 ON 

JZ 

UAS_ZE 

^OTHERWISE IT WAS 



za-i/a 

FLDl 


tSO-. 

FADD 

ST-,ST(D) 

^MULTIPLY BY THE 

FSflRT 

FMULP 

ST(1)-,ST 

iS(2UARE ROOT OF E 


lilAS_ZS: 

FSCALE 


FSTP ST(1) 

POP AX 

RET 

TbIOSTHEZ ENDP 

This may all seem like going to some trouble, but it does speed things 
up quite a bit over not having an 8087. How much? Try rewriting our 
super instruction "2 to the Z" in BASIC without the 8087. You'll find 
that one minute of 8087 exponentiation takes just about an hour with 
compiled BASIC and about three hours with interpreted BASIC. 

Of course, we aren't actually interested in raising two to some power 
all that often. With the TW02THEZ firmly in hand, it's easy to provide 
new super-instructions for e^, 10^, and y^. 


ililE JUST NEED TO SCALE 
nNOTICE UE DIDN'T 
CHECK 

;for over or underflow 
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•.EXP(X) {ST} 


EXP 

PROC 

NEAR 


FLDL2E 

FMULP 

STlD-iST 


CALL 

RET 

TU02THEZ 

EXP 

ENDP 


ilO 

TO THE X 

{ST} 

TEN2THEX 

PROC 

FLDL2T 

NEAR 


FMULP 

ST.STd) 


CALL 

RET 

TIilOBTHEZ 

TEN2THEX 

ENDP 


^Y 

TO THE X 

{STdjiST} 

ASSUMES 

Y IS POSITIVE IN {ST} 

•^ASSUMES 

X IN ST(1) 

Y2THEX 

PROC 

FYL2X 

NEAR 


CALL 

RET 

TW02THEZ 

Y2THEX 

ENDP 



32a MICROSECONDS 

.PUSH LOG E BASE 2 
^ST=X TIMES LOG E BASE 2 
nST=EXP(X) 


322 MICROSECONDS 

nPUSH LOG 10 BASE 2 
^ST=X TIMES LOG 10 BASE 2 
nST=10 TO THE X 


Maa MICROSECONDS 


;ST=Y TIMES LOG X BASE 2 
nST=Y TO THE X 


Trigonometric Functions 

The tangent function provides the base for calculating all the common 
trigonometric functions. FPTAN calculates the tangent for arguments 
between 0 and pi/4. Computation of a trigonometric function involves 
three broad steps. First, prologue code is used to bring the argument 
within range of the FPTAN instruction. Second, the FPTAN instruction 
is applied. Third, epilogue code is used to correct the result of FPTAN. 
The trigonometric identities used are described in the code below. 

^TANGENT {ST} 370 MICROSECONDS 

^HETA IN ST IS ASSUMED TO BE A VALID NUMBER 
iTHERE MUST BE AT LEAST S FREE STACK LOCATIONS 
iTHIS ROUTINE ASSUMES THAT THE FOLLOWING MEMORY 
iLOCATIONS HAVE BEEN DEFINED: 

;STATUS-WORD S BYTES 

; SIGN-STORE 1 BYTE 

iMINUSS 2 BYTES INITIALIZED TO -2 

TANGENT PROC NEAR 

PUSH AX 

PUSH BX 

iFIRST CHECK FOR A NEGATIVE ARGUMENT 
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NOTE TAN(-X)=-TAN(X) 


MOV 


SIGN_STORE-.0 

^ASSUME POSITIVE 

FTST 




FSTSU 


STATUS-WORD 


FUAIT 




MOV 


AHiBYTE PTR STATUS_WORD+l 

SAHF 




JNC 


NON-NEGATIVE 


MOV 


SIGN-STORE-.-1 

;ITS NEGATIVE 

FABS 



iNOW POSITIVE 

NON-NEGATIVE: 




•nNOU GET ST BETWEEN 

0 AND PI/M 


FILD 


niNusa 

“iLOAD -a 

FLOPI 



nLOAD PI 

FSCALE 



;got PI/M 

FSTP 


ST(1) 

iDunp -a 

FXCH 




; NOW X IS IN ST 

AND 

PI/M IN ST(1) 


RANGE: 




FPREM 




FSTSW 


STATUS-WORD 


FUAIT 




MOV 


AH,BYTE PTR STATUS_UORD+l 

SAHF 




JP 


RANGE 

!iTHIS TESTS BIT C2 

tAT this point ah 

HAS THE STATUS BITS 


tNou lets see if 

THE 

REMAINDER WAS EXACTLY ZERO 

FTST 




FSTSW 


STATUS-WORD 


FWAIT 




nIT WAS ZERO IF C3=l 

AND CD=D 


;IF ZERO-. SET BX= 

-1-, 

ELSE BX=D 


nov 


BXnO 


AND 


BYTE PTR STATUS-WORD+liOlDDODOlB 

cnp 


BYTE PTR STATUS-WORD+liDlDDQDDlB 

JNE 


NOT-ZERO 


nov 


BX-.-1 


NOT-ZERO: 




nTHERE ARE FOUR POSSIBILITIES GIVEN ST 

NOW HAS X nOD PI/M 

iOCTANT C3 Cl 


CALCULATE 

IF ZERO 

;o•.^ □ 0 


FPTAN(ST) 

Q 

nl.S □ 1 


1/FPTAN(PI/M - ST) 

1 

1 0 


-1/FPTAN(ST) 

INFINITY 

n3n7 1 1 


-FPTAN(PI/M - ST) 

-1 

tFIRST check bit 

Cl 

AND TAKE FPTAN 


TEST 


AH-ilDB 

nIS Cl ON 

JZ 


ClISOFF 

;junp IF OFF 

cnp 


BX-iQ 

;ST EXACTLY ZEROf 

JNE 


STOANDCl 

•.junp IF YES 
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FSUBP 

ST(1).ST 

;now PI/M-ST 

FPTAN 



JMP 

TANDONE 


STOANDCl: 



FSTP 

ST 

•.POP ST 

FSTP 

ST 

•. AND PI/M 

FLDl 


;load ratio 1 to 1 

FLDl 



JlIP 

TANDONE 


ClISOFF: 



FSTP 

ST(1) 

^iGET RID OF PI/M 

CMP 

BX.O 

^ST EXACTLY ZEROf 

JNE 

STQANDNOCl 

^JUriP IF YES 

FPTAN 



JMP 

TANDONE 


STQANDNOCl: 



FSTP 

ST 

^DUMP ST 

FLDZ 


aOAD RATIO 0 TO 1 

FLDl 



TANDONE: 



nPUT Cl XOR C3 IN 

BX 


MOV 

BX.D 

USSUME C3 OFF 

•.IF C3 IS ON THEN 

CHANGE SIGNS 


TEST 

AH.QIDQQQQQB 


JZ 

N0C3 

^JUMP IF OFF 

FCHS 



MOV 

BX.l 

^NOTE C3 ON 

N0C3: 



•.IS Cl ON f 



TEST 

AH.IQB 


JZ 

NOCl 

-.JUMP IF OFF 

XOR 

BX.l 


JMP 

RECIP 


NOCl: XOR 

BX.Q 


RECIP: 



•.IF BX = 1 THEN lilE 

WANT RECIPROCAL 

OF RATIO 

CMP 

BX.l 


JNE 

NORECIP 


FXCH 



NORECIP: FDIVP 

ST(1).ST 

•.THAT’S IT 

iDID UE ORIGINALLY CHANGE SIGN? 


cnp 

SIGN-STORE.D 


JE 

LEAVE-POS 


FCHS 



LEAVE_POS: 



POP 

BX 


POP 

AX 


RET 



TANGENT ENDP 



Sine and cosine functions are also calculated using FPTAN. Since a 

cosine is just a sine rotated 90 degrees. 

we build the cosine routine to 

make use of the code for sines. 
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;SINE {ST} 


S13 MICROSECONDS 

iTHETA IN ST IS 

ASSUMED TO BE A VALID 

NUMBER 

iTHERE MUST BE AT LEAST 3 FREE STACK 

LOCATIONS 

nTHIS ROUTINE ASSUMES THAT THE FOLLOWING MEMORY 

LOCATIONS HAVE 

BEEN DEFINED: 


^STATUS-WORD E 

BYTES 


“.SIGN-STORE 1 

BYTE 


^MINUSS S 

BYTES INITIALIZED TO 

-a 

.REALLY-COS 1 

BYTE 


SINE PROC 

NEAR 


PUSH 

AX 


PUSH 

BX 


;first check for 

A NEGATIVE ARGUMENT 


NOTE SIN(-X)=-SIN(X) 


MOV 

SIGN-STORE.□ 

USSUME POSITIVE 

FTST 



FSTSW 

STATUS-WORD 


FWAIT 



MOV 

AH.BYTE PTR STATUS-WORD+1 

SAHF 



JNC 

NON-NEGATIVE 


MOV 

SIGN-STORE.-1 

“.ITS NEGATIVE 

FABS 


^iNOW POSITIVE 

NON-NEGATIVE: 



MOV 

REALLY-COS.Q 

^SINE. NOT COSINE 

COS-ENTRY: 



iNOW GET ST BETWEEN □ AND PI/M 


FILD 

MINUSa 

^LOAD -a 

FLDPI 


^LOAD PI 

FSCALE 

“.GOT PI/M 

FSTP 

ST(1) 

^DUMP -a 

FXCH 



NOW X IS IN ST 

AND PI/M IN ST(1) 


RANGE: 



FPREM 



FSTSW 

STATUS-WORD 


FWAIT 



MOV 

AH.BYTE PTR STATUS-WORD+1 

SAHF 



JP 

RANGE 

;this tests bit ca 

.AT THIS POINT AH HAS THE STATUS BITS 


;IF WE ARE REALLY DOING COSINE. WE NEED TO ADD TWO TO THE 

OCTANT 



CMP 

REALLY-COS.O 


JE 

ITS-SINE 


UDD INTO C3 AND 

CARRY INTO CO 


XOR 

AH.OIOODDOOB 


TEST 

AH.QIOOQDOOB 


JNZ 

NOCARRY 


XOR 

AH.IB 
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NOCARRY: 

ITS-SINE: 


tNOU lets 

SEE IF THE 

REMAINDER 

WAS EXACTLY ZERO 


FTST 




FSTSU 

STATUS-WORD 


FUAIT 



nIT WAS ZERO IF C3=l 

AND CD=0 


MF ZERO. 

SET BX=-1. 

ELSE BX=0 



MOV 

CD 

X 

o 



AND 

BYTE PTR STATUS-UORD+I.DIOQOOOIB 


CMP 

BYTE PTR STATUS-UORD+I.OIODDDDIB 


JNE 

NOT-ZERO 



MOV 

BX.-l 


NOT-ZERO: 




•.THERE ARE FOUR POSSIBILITIES 

GIVEN ST NOW HAS X MOD PI/M 

•.OCTANT 

C3 Cl 

CALCULATE 

IF ZERO 


□ □ 

SIN(ST) 

D 

-.1 

□ 1 

COS(PI/M - 

- ST) S(2RT(a)/a 

•.E 

1 □ 

COS(ST) 

1 

-.3 

1 1 

SIN(PI/M - 

■ ST) S(2RT(a)/a 

. 

; OCTANTS 

M-7 ARE JUST LIKE 0- 

3 ONLY NEGATIVE 

^NOTE: IF 

TAN(THETA)= 

X/Y. THEN 


. 

SIN(THETA)= 

=X/S(3RT(X*X 

+ Y*Y) 

. 

COS(THETA)= 

= Y/S(3RT(X*X+Y*Y) 

•.FIRST CHECK BIT Cl 

AND TAKE FPTAN 


TEST 

AH.IQB 

MS Cl ON 


JZ 

ClISOFF 

•.JUMP IF OFF 


CMP 

BX.D 

;ST EXACTLY ZEROf 


JNE 

STDANDCl 

;JUMP IF YES 


FSUBP 

ST(1).ST 

^NOW PI/M-ST 


FPTAN 




JMP 

SINDONE 


STDANDCl: 





FSTP 

ST 

•.POP ST 


FSTP 

ST 

^ AND PI/M 


FLDl 


•.LOAD RATIO 1 TO 1 


FLDl 




JMP 

SINDONE 


ClISOFF: 





FSTP 

ST(1) 

^GET RID OF PI/M 


CMP 

BX.D 

•.ST EXACTLY ZEROf 


JNE 

STDANDNOCl ^JUMP IF YES 


FPTAN 




JMP 

SINDONE 


STQANONOCl: 




FSTP 

ST 

^DUMP ST 


FLPZ 


aOAD RATIO □ TO 1 


FLDl 
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SINDONE: 

as Cl XOR C3 TRUEf 



MOV 

BX-.Q 

USSUME C3 OFF 

UF C3 IS 

ON 




TEST 

AH-.oiooaaooB 



JZ 

N0C3 

iJUMP IF OFF 


MOV 

r 

X 

CD 

iNOTE C3 ON 

N0C3: 




as Cl ON 

f 




TEST 

AHnlDB 



JZ 

NOCl 

^JUMP IF OFF 


XOR 

Bxa 



JMP 

DOSINE 


NOCl: 

XOR 

BX.O 


DOSINE: 




II 

X 

CD 

U. 
M 
• r 

THEN lilE 

WANT WANT COSINE 

FUNCTION 


CMP 

BXil 



JNE 

SINFUNC 



FXCH 



SINFUNC: 




aT(l)=X-. 

ST(0)=Y 



aiN{THETA)=X/S(3RT(X*X+Y»Y) 



FMUL 

ST(D)iST{0) 

nST(0)=Y*Y 


FLD 

ST(1) 

aT(0)=X 


FMUL 

ST(0)-.ST(0) 

aT(0)=X*X 


FADDP 

ST(1)-.ST(D) 

aT(0)=X*X+Y*Y 


FS(3RT 




FDIVP 

ST(1)-.ST(Q) 


as BIT CO ONf 




TEST 

AH.IB 



JZ 

CDOFF 



NOT 

SIGN-STORE 



CDOFF: 

aO hlE NEED TO CHANGE SIGNf 


CMP SIGN-.STORE-.0 

JE LEAVE-POS 

FCHS 

LEAVE-POS: 

POP BX 

POP AX 

RET 

SINE ENDP 


aOSINE {ST} 510 MICROSECONDS 

aHETA IN ST IS ASSUMED TO BE A VALID NUMBER 
aHERE MUST BE AT LEAST 3 FREE STACK LOCATIONS 
nTHIS ROUTINE USES THE SINE ROUTINE 
COSINE PROC NEAR 

PUSH AX 

PUSH BX 
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FABS 

nov 

SIGN-STOREiO 

-.ITS POSITIVE NOW 

MOV 

REALLY^COSt-l 


JMP 

C0S_ENTRY 


COSINE ENOP 




For further explanation of trigonometric calculations and for programs 
which perform sophisticated error checking, see 

Getting Started With the Numeric Data Processor, by Bill Rash, Intel Cor¬ 
poration, Application Note AP-113. 

Inverse Trigonometric Functions 

The 8087 instruction FPATAN performs’ the core calculations for the in¬ 
verse trigonometric functions: Arctan, Arcsin, Arccos, Arccot, Arccsc, 
and Arcsec. Just as FPTAN produces a result in the form Y/X, so FPATAN 
accepts an argument in the form Y/X. The inverse trigonometric functions 
require somewhat less programming, because the argument range is less 
restricted for FPATAN than for FPTAN. (The direct trigonometric func¬ 
tions are periodic, where the inverse trigonometric functions aren't.) For 
FPATAN, we need only assure that the arguments obey the relation 0 
< Y < X < infinity. Thus to compute Arctan(Z) we need to check seven 
cases: Z equal 0, Z positive or negative and ABS(Z) less than, equal to, 
or greater than 1. We bring Z into the proper range by using the identities: 

Arctan(Z) = — Arctan( —Z) 

Arctan(Z) = pi/2 — Arctan(l/Z) 


URCTAN {ST} 3S1 MICROSECONDS 

^ST IS ASSUMED TO BE A NORMAL NUMBER 
iTHERE MUST BE AT LEAST 3 FREE STACK LOCATIONS 
iTHIS ROUTINE ASSUMES THAT THE FOLLOWING MEMORY 


iLOCATIONS HAVE 

BEEN DEFINED: 


iSTATUS-WORD S 

BYTES 


iSIGN-STORE 1 

BYTE 


ARCTAN PROC 

NEAR 


PUSH 

AX 


iTHE FIRST PROBLEM IS TO CHECK FOR A ZERO OR 


NEGATIVE 

ARGUMENT 


MOV 

FTST 

SIGN-STOREiO ’.ASSUME 

NON-NEGATIVE 

FSTSU 

FUAIT 

STATUS-WORD 


MOV 

SAHF 

AH.BYTE PTR STATUS-WORD+1 

JA 

POSITIVE 


JZ 

ZERO -.ASSUME 

ITS ZERO 

JMP 

NEGATIVE 
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ZERO: 

•nARCTAN(D)=0 


FSTP 

ST(D) 



FLDZ 




JflP 

DONE 



NEGATIVE: iDEAL 

WITH A NEGATIVE 

ARGUMENT USING 

IDENTITY 

nARCTAN(-X)=-ARCTAN(X) 




FCHS 




nov 

SIGN_STORE-.-l 



POSITIVE: 


•n HOW DOES 1 COMPARE TO 



X 


FLDl 




Fcon 




FSTSU 

STATUS-WORD 



FlilAIT 




MOV 

AHiBYTE PTR STATUS_WORD+l 


SAHF 




JA 

Z_LT_1 



JC 

Z-GT-l 



lEXACTLY 1 RETURN ARCTAN(1)=PI/M 



FCHS 


nST NOU=-l 


FAOD 

ST(0)-.ST(D) 

iST=-a 


FLOPI 




FSCALE 


;ST NOW PI/M 


FSTP 

ST(1) 



jnp 

RESTORE-SIGN 



Z-GT-l: 




lUSE IDENTITY ATAN(X) 

= PI/a - ATAN{1/X) 



FXCH 


iST=Z-,ST(l)=l 


FPATAN 




FLDl 


iNOW ADJUST BY 

PI/2 

FCHS 




FLDPI 




FSCALE 




FSTP 

ST(1) 



FSUBRP 

ST(1)-.ST 



JMP 

RESTORE-SIGN 



1 

1- 

_J 

1 

M 




FPATAN 


nST=l-.ST(l)=Z 


RESTORE-SIGN: . 




TEST 

SIGN-STORE-.OFFH 



JZ 

DONE 



FCHS 




DONE: 




POP 

AX 



RET 




ARCTAN ENDP 
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Processor Control Instructions 

Sixteen instructions are used to examine and control the internal status 
of the 8087. We make regular use of the instructions that manipulate the 
status word and the control word. In particular, these instmctions are used 
for examining the results of comparisons and for setting the controls, 
such as for rounding, on the 8087. Most of the other instructions are 
needed for writing system programs. We discuss these briefly for com¬ 
pleteness. The processor control instruction FSTSW (store status word) 
was discussed in Chapter 6. 

FLDCW word-integer 4 microseconds 

FLDCW (load control word) loads a word from a two-byte memory lo¬ 
cation into the 8087's internal control word register. FLDCW is used, for 
example, to change the 8087 rounding control. 

FSTCW word-integer 5 microseconds 

FSTCW (store control word) stores the 8087 control word at the two-byte 
destination location. We used FSTCW earlier to save a clean copy of the 
control word before changing rounding control. Later we used FLDCW 
to restore the control word to its original state. 

FWAIT 

FWAIT is actually an 8088, not an 8087, instruction. (The FWAIT mne¬ 
monic generates the 8088 WAIT instruction.) FWAIT halts the 8088 until 
the 8087 completes its current instruction. FWAIT should be coded before 
any 8088 instruction that references a memory location being read from 
or written to by the 8087. During an FWAIT, the 8088 checks the 8087 
once per microsecond, and resumes execution as soon as the 8087 is free. 

The description of the remaining processor control instructions is in¬ 
cluded for completeness. None of these instructions are necessary for 
the programs in this book. 

The following two instructions are useful in writing subroutines be¬ 
cause they allow a subroutine to save a copy of the 8087's internal state 
and then restore it. 

FSAVE memory 44 microseconds 

FSAVE (save state) copies all internal 8087 information into a 94-byte area 
in memory. It then reinitializes the processor by executing an FINIT (see 
below). Figure 12.1 illustrates the layout of the memory save area. 

The reinitialization feature of FSAVE can cause undesired side effects, 
such as unintentionally resetting rounding control. The control word is 
easily restored by following "FSAVE memory" with "FLDCW memory." 



12 n Advanced Instruction Set 197 


INCREASING ADDRESSES 


IS 0 



S = Sign 

Bit 0 of each field is rightmost, least significant bit of corresponding 
register field. 

Bit 63 of significand is integer bit (assumed binary point is immediately 
to the right). 

Figure 12.1. Memory layout for 8087 internal state. 
(Used with permission of intel Corporation.) 


FRSTOR memory 44 microseconds 

FRSTOR (restore state) reloads the 8087 state from the 94-byte area in 
memory, effectively "undoing” a previous FSAVE. 

FSAVE and FRSTOR provide a mechanism by which a subroutine can 
use the 8087 and then return it to its original state. BASIC requires us 
to protect certain 8088 registers in an analogous way. (That's why many 
of our routines started with "PUSH BP" and ended with "POP BP.") 
Use of FSAVE/FRSTOR may or not be required, depending on the con¬ 
ventions of a given language translator. Note that the following code can 
be used to save and restore onto the 8088 stack. 
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SUB 

SPhTM 

MOV 

BPiSP 

FSAVE 

[BP] 

MOV 

BP-.SP 

FRSTOR 

[BP] 

ADD 

SP-.^ 


FINIT 1 microsecond 

FINIT (initialize processor) resets the 8087. The initialized conditions are 
described in Figure 12.2. 


Field 

Value 

Interpretation 

Control Word 



Infinity Control 

0 

Projective 

Rounding Control 

00 

Round to nearest 

Precision Control 

11 

64 bits 

Interrupt-enable Mask 

1 

Interrupts disabled 

Exception Masks 

mill 

All exceptions masked 

Status Word 



Busy 

0 

Not busy 

Condition Code 

???? 

(Indeterminate) 

Stack Top 

000 

Empty stack 

Interrupt Request 

0 

No Interrupt 

Exception Flags 

000000 

No exceptions 

Tag Word 



Tags 

11 

Empty 

Registers 

N.C. 

Not changed 

Exception Pointers 



Instruction Code 

N.C. 

Not changed 

Instruction Address . 

N.C. 

Not changed 

Operand Address 

N.C. 

Not changed 


Figure 12.2. 8087 initial conditions. (Used with permission of Intel 

Corporation.) 


Interrupt and Exception-handling Instructions 

Normally, we allow exceptions to be masked; that is, the 8087 hardware 
handles computational errors automatically. If a given exception type is 
unmasked, the 8087 will interrupt the 8088 when the exception occurs. 
In this way, a computational error can be processed by user- or system- 
specified exception-handling software. If the 8088 is handling a task with 
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higher priority than accepting 8087 "messages," 8087 interrupts can be 
disabled with the FDISI instruction. 

FDISI 1 microsecond 

FDISI (disable interrupts) disables interrupts by setting the interrupt en¬ 
able mask bit in the control word. 


FENI 1 microsecond 

FENI (enable interrupts) enables interrupts by clearing the interrupt en¬ 
able mask bit in the control word. 


FCLEX 1 microsecond 

FCLEX (clear exceptions) clears the exception flags, the interrupt request 
flag, and the busy flag in the status word. FCLEX is principally used by 
exception-handling routines after an exception has been taken care of. If 
the exception were not cleared before returning control to the 8087, a 
second interrupt request would be issued immediately. 

FSTENV memory 11 microseconds 

FSTENV (store environment) stores the control, status, and tag words, 
and the exception pointers in a 12-byte memory area, so that these items 
may be examined by an exception handling routine. FSTENV stores a 
subset of the information stored by FSAVE and operates with consid¬ 
erably greater speed. Figure 12.3 illustrates the layout of the save area. 


INCREASING ADDRESSES 


INSTRUCTION 

POINTER 


OPERAND 

POINTER 



+ 0 
+ 2 
+4 
*6 
+ 8 
+ 10 
+ 12 


Figure 12.3. Memory layout for 8087 internal “environment” infor¬ 
mation. (Used with permission of Intel Corporation.) 

FLDENV memory 10 microseconds 

FLDENV (load environment) loads the control, status, and tag words, 
and the exception pointers from a 12-byte memory area, as in Figure 
12.3. 
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FiNCSTP 2 microseconds 

FINCSTP (increment stack pointer) increments the 8087 stack pointer. 
Do not use this instruction to pop the stack, since it does not mark ST 
as empty. Use FSTP ST(0) instead. 

FDECSTP 2 microseconds 

FDECSTP (decrement stack pointer) decrements the 8087 stack pointer. 

FFREE ST(i) 2 microseconds 

FFREE (free register) marks the indicated register as empty. 

FNOP 3 microseconds 

FNOP (no operation) executes an FST ST,ST(0) in order to do nothing. 

Advanced Instruction Set Summary 

This chapter has seen much intricate detail. It took, for example, about 
90 instructions to calculate a tangent even though the 8087 has a built-in 
“tangent" instruction. (If you think it took a lot of work this way, try 
writing a tangent instruction using only 8088 code!) 

It is more difficult to build a small set of assembly language modules 
for non-linear problems than it is for linear problems. This is a place 
which really calls for an 8087-compatible language translator. (The non¬ 
linear programs in the next chapter are written in BASIC for this reason.) 
Nonetheless, it is instructive to see just how much improvement we can 
expect from the 8087. 

Without the 8087, compiled BASIC requires about 26,800 microseconds 
to calculate a double precision tangent. Our assembly language program 
uses about 460 microseconds. Even using a poor 8087-compatible trans¬ 
lator, you can look for an ord^r of magnitude speed improvement on 
non-linear operations. 





Non-Linear Methods 


Given a non-linear function, y = f(x), how do we find the value of x that 
makes y equal to zero? The value that makes y equal one? What value 
of X gives the maximum possible value of the function? Answers to these 
and related questions are the subject of this chapter. BASIC's DEF FN 
statement makes it easy to define an algebraic formula as a function, f(x). 
For example, suppose we wish to explore the function 

y = 17-(x-12)2 

We write this in BASIC as 

ID DEFDBL Y-,X 

SD DEF FNY(X)=17-(X-1S)A2 

Of course, this particular function could be coded in assembly language 
in only a few minutes. A really complicated function might take some 
time. Worse, every time we need to work with a new function, we would 
need to write a new assembly language routine. Non-linear programs 
call for use of a high-level language. We use BASIC due to its widespread 
availability for personal computers. 

In this chapter we discuss: 

• Numerical differentiation 

• Numerical integration 

• Solving a non-linear equation 

• Non-linear optimization 

For many readers, the most interesting topic may be "solving a non¬ 
linear equation." As the first sections of the chapter provide useful back¬ 
ground material for solving non-linear equations, you should probably 
work through these sections as well. 

For a concrete focus of the discussion which follows, look now at the 
chart of the function y=f(x). Figure 13.1 shows the plot of our sample 
function. 
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This particular function equals zero at x=7.88 and x= 16.12. It reaches 
its maximum value, 17.0, at x = 12. 

Numerical Differentiation 

The derivative of a function is the slope of the function at a particular 
point. To find the derivative graphically, draw a line tangent to the 
function at the point of interest and measure the ratio of the change in 
the vertical distance to the change in the horizontal distance, as in Figure 
13.2. 

The computer can't very weU draw such a line (at least, not unless it 
knows the slope). Since the computer can easily evaluate the function, 
we approximate the tangent line by picking another point close to the 
point of interest and having the computer effectively "draw" a line to 
connect these two points. Figure 13.3 shows an "enlargement" of a small 
part of the function with just such a line. 
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Y 



The ratio of the vertical to the horizontal change is approximately the 
derivative of the function. If we pick the second point quite close to the 
first point, then the approximation will be quite accurate. How close do 
we need to be to assure some desired level of accuracy? The usual pro¬ 
cedure is to evaluate the derivative once and then re-evaluate it with a 
closer second point. If the two answers lie within a distance "epsilon" 
of each other (that is, if they are no more than epsilon apart), then the 
answers are probably within epsilon of the true answer as well. 

The following BASIC program evaluates the derivative of the function 
FNY at the point XO, assuming we require an answer accurate to within 


plus or 

minus EPS. 

ID 

DEFDBL Y-,X-.FiE-.D 

ED 

DEF FNY(X)=17-(X-1E)aS 

3D 

REM SET XD EPS ITLIM 

^D 

XD^lt 

5D 

EPS=.DD1 

L.D 

ITLin=lDD 

7D 

DELTA=.D1*XD 
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Y 



0 


Figure 13.3. Enlarged view of tangent line. 

flO FXD=FNY(X0) 

*=!□ 1>IF0LI)=(FNY(XD+DELTA)-FXQ)/DELTA 
100 11=1 

IID REM LOOP UNTIL CONVERGENCE OR ITERATION LIMIT REACHED 

laO DELTA=DELTA/a 

130 DIF=(FNY(XD+DELTA)-FXO)/DELTA 

mo IF ABS(DIF-DIFOLD) <EPS THEN IDOQ 

ISO IT=IT+1 

IbD IF IT>ITLIM THEN aOOD 

170 DIFOLD=DIF 

IflD GO TO lao 

IDOD REM CONVERGENCE ACHIEVED 

1010 PRINT "DERIVATIVE AT IS "iDIF 

lOaO PRINT " AFTER "UTi" ITERATIONS" 

1030 STOP 

aOOD REM NO CONVERGENCE 

aOlO PRINT "FAILED TO CONVERGE AFTER "UTLIMn" ITERATIONS" 
aoao PRINT "APPROXIMATE DERIVATIVE AT ";XDn" IS "iDIF 
a03a STOP 
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Always take the accuracy of this sort of numerical approximation with 
a grain of salt. It can happen that two successive approximations are 
close to one another without being equally close to the correct answer. 
Round-off error can also give results a false appearance of accuracy. The 
arithmetic operation at which computers are the least accurate is sub¬ 
tracting two numbers that are nearly equal in value, as in FNY(X0 -f DELTA) 
— FNY(XO), for example. 

By the way, you should always include an "iteration limit" in a program 
that otherwise relies on a mathematical condition to stop. Computer 
arithmetic is imperfect. With sufficient bad luck ("sufficient bad luck" 
means "sooner or later for sure"), your program will end up in an endless 
loop, if it doesn't have a guaranteed stopping mechanism. 

The execution time for a non-linear program is roughly proportional 
to the number of function evaluations. That's why we evaluated FNY(XO) 
early in the program and saved the answer. 

Numerical Integration 

Integration is the inverse function of differentiation. Integration tells us 
the area under a curve between two points. The area under our sample 
function, from XLOWER to XUPPER, is shown in Figure 13.4. 

We can approximate the area under the curve by drawing in rectangles 
as in Figure 13.5. The total area in all the rectangles is approximately the 
area under the curve. The more, and smaller, the rectangles we draw, 
the closer we come to the answer. 

If we draw n rectangles, we make the width of each one one-nth of 
the distance between XLOWER and XUPPER. Since the area is just the 
height times the width, and since each of the n rectangles has the same 
width one-nth, we can find the area by just adding up the heights and 
multiplying the sum by the XUPPER-XLOWER. To obtain a more accurate 
answer, we cut each old rectangle in half and add new rectangles as in 
13.6. 

The following BASIC program integrates the function FNY. 

10 DEFDBL Y-,X-.F-.EiD-.A 

20 DEF FNY(X)=17-(X-12)a2 

30 REM SET XLOWER XUPPER EPS ITLIM 

MO XL0UER=1 

SO XUPPER=13 

bO EPS=-001 

70 ITLII1=100 

80 XlilIDTH=(XUPPER-XL0IJER)/2 

10 FSUn=FNY(XLOIilER)+FNY(XUPPER) 

100 AREA0LD=FSUI1*(XUPPER-XL0lilER) 

110 IT=1 
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X 



Figure 13A Area under function y=17-(x-12)2. 


13D REM LOOP UNTIL CONVERGENCE OR ITERATION LiniT REACHED 
mo XUIDTH=XlilIDTH/a 
150 N=N«E 
IbO FSUI1=0 

170 FOR 1=1 TO N STEP E 

IflO X=XLOUER+XIJIDTH*I 

no FSUI1=FSUn+FNY(X) 

EDO NEXT I 

ElO AREA = FSUI1*(XUPPER-XL0WER)+(AREA0LD/E) 

EEO IF ABS(AREA-AREAOLD) <EPS THEN 1000 
E30 IT=IT+1 

EMD IF IT>ITLII1 THEN EDDD 

ESO AREAOLD=AREA 

Ebo GO TO mo 

1000 REM CONVERGENCE ACHIEVED 

1010 PRINT "INTEGRAL FROM "^XLOlilERi" TO"iXUPPERn" IS "lAREA 
lOEO PRINT " AFTER "aTi" ITERATIONS" 

1030 STOP 

EOOO REM NO CONVERGENCE 
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Y 



Figure 13.5. Approximation to area under function. 


2010 PRINT "FAILED TO CONVERGE AFTER "^TLIIIi" ITERATIONS 
2020 PRINT "APPROXIMATE INTEGRAL IS "iAREA 
2030 STOP 

Of course, one can usually use calculus in place of numerical com¬ 
putation. The formula for the derivative of our sample function is 24 - 2x. 
The formula for the integral is 17x-(l/3)(x—12)^. 

Derivatives can be found by applying the rules of calculus mechani¬ 
cally, so sometimes packaged programs actually figure out the formula 
for the derivative instead of using numerical methods. Integrals cannot 
be found by purely mechanical rules. 

Solving a Non-linear Equation 

Suppose we have a function y = f(x) and know that the value of y is YO. 
How can we find the value of x that produced YO? Suppose that Y0 = 0 
and look back at Figure 13.2. Start at the point (f(X0),X0). If the function 
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Y 



Figure 13.6. Refined approximation to area under function. 


f(x) was actually a straight line, we could just run our finger down the 
tangent line until we hit the x axis at (0,X1). Since f(x) is not a straight 
line, when we hit the x-axis we actually get (f(Xl),Xl) instead. Now draw 
a tangent line from the new point and try again. 

If the function f(x) is sufficiently smooth, this "shooting method" will 
usually converge to the correct point fairly quickly. However, sometimes 
after we shoot down the tangent, we are even further from the correct 
answer than we were originally. We'll add another rule to the procedure 
to prevent this. If our new guess is even further from the right spot then 
the initial guess, we cut in half the size of the step we took and try again. 
The BASIC program below implements this mo^fied shooting method. 

10 DEFDBL Y-.XiF-.EiD 

20 DEF FNY(X)=17-(X-ia)AS 

30 DEF FNDIF(X)=(FNY((l-fDEL)*X)-FNY(X))/(DEL*X) 

^0 REH set YTARGET EPS del XO ITLIM 

SO YTARGET=0 

LO DEL=.001 

70 EPS=.01 
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flD xo=m 

X=XD 

IDD ITLI[1=aD 

IIQ STEPLII1=S 

lED Y=FNY(X) 

13D IT=1 

mo ISTEP = 0 

ISO STEPSIZE=1 

IbO XNEU=X+STEPSIZE*((YTARGET-Y)/FNDIF(X)) 

170 YNEL)=FNY(XNEU) 

IflD IF ABS(YNEIi)-YTARGET)<EPS THEN IQDO 

no IF ABS(YNEIi)-YTARGET)>ABS(Y-YTARGET) THEN SOD 

aOD Y=YNEli) 

EID X=XNEy 

EED IT=IT+1 

E30 IF IT>ITLII1 THEN EQDQ 

E^□ GOTO mo 

SOD REfI REDUCE STEP SIZE 

SID ISTEP=ISTEP+1 

5E0 IF ISTEP>STEPLin THEN EED 

S3D STEPSIZE=STEPSIZE/E 

S^□ GOTO ItiD 

lODD PRINT "SOLUTION IS "^XNEUi" AFTER "nITi" ITERATIONS" 
IDID STOP 

EODO PRINT "FAILED TO CONVERGE AFTER "nlTLIMi" ITERATIONS" 
EOID PRINT "APPROXIMATE ANSWER "^XNEW 

EOEO STOP 


For the sample function, there are actually two correct answers for 
some values of YTARGET. This program only finds one, usually the 
closest to the initial starting point XO. In order to check for more than 
one solution, the program can be rerun with several different initial 
values. 

Notice that the program uses DBF FNDIF to approximate the deriva¬ 
tive. To increase the accuracy of the final solution, DEL should generally 
be reduced along with EPS. FNDIF could be redefined to give the exact 
derivative by using calculus. This would speed up the program a little 
by reducing the number of function evaluations and possibly also because 
of the greater accuracy of an exact derivative. On the other hand, figuring 
out analytic derivatives is more work for the user. 

Non-linear Optimization 

Suppose that y in our sample function described the profits of a small 
programming business as a function of the number of hours, x, spent 
typing on the keyboard of a personal computer. We would like to maximize 
this function, that is, find the value of x that gives us the highest possible 
value of y. 
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If you look back at Figure 13.1 you will see that at the highest point 
of the function the slope of the function is zero. In calculus terms, the 
derivative of a function equals zero at its maximum. (In logical terms, if 
a function is still going up, we should search further to the right; if it's 
going down, we've gone too far. Exactly at the maximum, the function 
must be going neither up nor down. Its slope must be zero.) 

Finding the maximum of a function f(x) reduces to finding the point 
where the derivative of f(x) equals zero. The previous program will handle 
this quite nicely if we redefine the function calls to look at the derivative 
instead of the original function and the calls on the function for the 
derivative to look at the derivative of the derivative. 

ID DEFDBL Y-,X-.F-.E-,D 

aO DEF FNY(X)=17-(X-ia)Aa 

3D DEF FNDIF(X)=(FNY((1+DEL)*X)-FNY(X))/(DEL:»^X) 

35 DEF FNDDIF(X)=(FNDIF((1+DEL)*X)-FNDIF(X))/(DEL*X) 

MD REM SET YTARGET EPS DEL XD ITLIM 

SD YTARGET=D 

LD DEL=.DD1 

7D EPS=.DD1 

BD XD=m 

■^D X=XD 

IDD ITLin=aD 

IID STEPLin=S 

lED Y=FNY(X) 

13D IT=1 

mD ISTEP=0 

ISD STEPSIZE=1 

IbD XNEU=X+STEPSIZE*({YTARGET-Y)/FNDDIF(X)) 

17D YNEU=FNDIF(XNEIil) 

IflD IF ABS(YNEU-YTARGET)<EPS THEN IDDD 

nD IF ABS(YNElil-YTARGET)>ABS(Y-YTAR6ET) THEN SDD 

EDO Y = YNEIil 

EID X=XNEU 

EED IT=IT+1 

E3D IF IT>ITLin THEN EDDD 

E4D GOTO mo 

SDD REM REDUCE STEP SIZE 

SID ISTEP=ISTEP+1 

SED IF ISTEP>STEPLin THEN EED 

53D STEPSIZE=STEPSIZE/E 

SMD GOTO ILD 

IDDO Fn=FNY(XNEU) 

IDID IF FM>=FNY((l + DEL)*XNElil) AND FM>=FNY({l-DEL)*XNEli)) THEN 
IDSD 

IDED PRINT "CAN’T FIND MAXIMUM" 

1D30 PRINT "STOPPED AT "iXNEU^ "AFTER "ilTi" ITERATIONS 
1D^D STOP 

IDSD PRINT "MAXIMUM IS AT "iXNEIil;" AFTER "nlT^" ITERATIONS 
IDbD PRINT "VALUE AT THE MAXIMUM IS "nFNY(XNEIil) 

1D7D STOP 
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2000 PRINT "FAILED TO CONVERGE AFTER "ilTLIdi" ITERATIONS" 
2010 PRINT "APPROXIMATE ANSWER ",XNEU 
2020 STOP 

Our BASIC program will find that the maximum, y = 17, is found at 
X = 12. Just as a non-linear function can have exactly one solution, no 
solution, or many solutions, in the same way a function can have one 
maximum, no maximum, or many maxima. The code at line 1000 checks 
for the possibility that the program located a point where the the deriv¬ 
ative equals zero, but which is not a maximum. Such a point might be 
a minimum or an "inflection point." Even with this check, care on the 
part of the user is still a good idea. The program has no way to check 
whether it has found only a "local maximum," that is, whether there 
might be a point elsewhere that has an even higher value than the point 
found by the program. 

Back to Linearity 

In the next chapter, we build a small statistical analysis system and, in 
so doing, return to linear problems and to the use of assembly language 
modules. 



Statistical Analysis and 
Program Canning 

This chapter has two principle objectives: gaining an understanding of 
some of the basic techniques of programming for statistical analysis, and 
working through an example of how to make a "canned" program. This 
chapter will give you: 

• Some basic methods for statistical analysis. 

• Some practice in going from mathematical ideas to working pro¬ 
grams. 

• An adaptable "canned" program (which you can modify if you wish 
to add your own procedures). 

• A complete, working multiple regression package. 



The Cookbook—Chapter 14 

Program: 

8087 Statistical Analysis Program 

Purpose: 

"Canned" program for multiple regression and other 
statistical analysis. 

Input: 

Interactive. 

Output: 

Interactive. 

Language: 

BASIC with 8087/8088 assembly language modules. 


Statistical Analysis 

Three of the basic procedures used in statistical analysis are descriptive 
statistics, correlation, and multiple regression. These methods are used 
to summarize data, to examine the relation between different events, and 
to make tests of scientific hypotheses. We discuss the use of these meth¬ 
ods, and how to perform the necessary calculations, below. 
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One caveat first. The combination of sophisticated statistical methods 
and high-speed computers has made it possible to draw incorrect con¬ 
clusions far more easily than at any time in the past. Almost any statistical 
procedure can be applied to almost any body of data. Just because we 
can do so does not mean we should. Various important mathematical 
caveats and warnings are omitted from the discussion below, since a 
thorough job would occupy a PhD course. We hope that the reader 
experienced in statistical analysis will not be offended—and that the 
reader first encountering statistical analysis will be careful. 

Descriptive Statistics 

Given repeated observations on an "event," such as the number of cars 
passing a certain intersection between 8:00 AM and 8:05 AM, we begin 
a statistical analysis by looking for simple ways to characterize the ob¬ 
served data. Assume we have made n observarions. Call a typical datum, 
"x." 

The very first question usually asked is "What was the average value 
of the data?" Calculating an average is simple. The mean of the data is 
the sum of all the data points divided by the number of observations. 
The mean of x is often written x. 

- _ Xi -H X2 + . . ■ -H Xn 

n 

Next we would like to determine whether the observed data all lie close 
to one another or whether they are spread out over a wide range. The 
most common measurement is called the variance. Variance is a measure 
of the dispersion of data around its mean. Essentially, the variance is the 
average of the squared value of the difference between x and the mean 
of X. The variance can be calculated by subtracting the mean off of each 
datum, squaring this difference, summing the results, and dividing by 
n — 1. (It turns out that, under reasonable assumptions, dividing by n — 1 
rather than n gives a more accurate average answer.) 

, , (Xi-X)2 -|-(X2-X)2 -t-. . .(Xn-X)2 

var(x) = 

Closely related to the variance is the standard deviation. The standard 
deviation is the square root of the variance. The standard deviation is 
frequently a more convenient measure than the variance because it has 
the same units of measurement as the original data. If you multiply every 
piece of data by, say, 16, you also multiply the mean and the standard 
deviation by 16, while the variance is multiplied by 256. Thus if the 
original data is measured in pounds, then both mean and standard de¬ 
viation are measured in pounds (and 16 times either is measured in 
ounces), while the variance is measured in the less familiar units of 
"pounds-squared." 
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The following rule gives a feeling for the "spread" of your data: If the 
data is drawn from a Normal ("bell curve") distribution, then about two- 
thirds of the data should lie in the range from one standard deviation 
below the mean to one standard deviation above the mean. 

While these three descriptive statistics are probably the most common, 
there are many others you might also look at. What are the highest and 
lowest values of the data? What's the middle value? Does the data cluster 
around certain values? We'll stop at three: mean, variance, and standard 
deviation. 


Correlation 

Given two sets of data, x and y, we are frequently interested in whether 
the two sets of data tend to move together, to move in opposite directions, 
or whether the two appear to be unassociated. Statisticians use the cor¬ 
relation coefficient as a measure of association between two variables. Two 
variables that are exactly proportional to one another have a correlation 
coefficient of one. Two variables that are exactly proportional but that 
move in opposite directions have a correlation coefficient of minus one. 
A zero correlation coefficient usually indicates that knowing x tells you 
nothing about y, and vice versa. 

The correlation coefficient is constructed as a ratio. The numerator 
measures how much x and y move together. The denominator measures 
how much each moves separately. The numerator is calculated as the 
average of the product of x minus x's mean and y minus y's mean. The 
numerator is the product of the standard deviations of x and y. (However, 
in this context we use n rather than n-1 in calculating the standard 
deviations.) 

Guess what's back! Our friend from linear algebra—the inner product. 
(We promised you it was good for more than playing with systems of 
linear equations.) Think about calculating the numerator of the correlation 
coefficient. We begin the calculation by preparing two vectors, the first 
made up of each observation of x minus x's mean and the second made 
up similarly from y. The inner product of these vectors is the sum of the 
product of the elements. So the required average is just the inner product 
divided by n. 

Actually, the same calculation can be done in a more simple form by 
avoiding the construction of the two vectors of deviations from the means. 
A little algebra will show that the required average can also be calculated 
as the irmer product of x and y divided by n, minus n times the quantity 
the mean of x times the mean of y. 

A little exercise for the reader: what's the correlation coefficient be¬ 
tween X and y if the observations of x are all the same? The answer is 
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that it doesn't make much sense to ask whether x and y move together 
if X doesn't move at all. Both the numerator and denominator of the 
correlation coefficient equal zero. We'll want to watch for this situation 
when programming in order to avoid "Division by zero" error messages. 


Multiple Regression 

Multiple regression must be far and away the most common statistical 
technique for equation estimation and forecasting. Suppose we have a 
sequence of observations on a variable to be explained, the dependent 
variable, y, and observations made at the same time on several explan¬ 
atory, or independent, variables, Xi, X 2 , and so forth. We might look for a 
linear relationship between the dependent and independent variables of 
the form: 

y = bo -t- biXi -I- b2X2 b^x^ + u 

where u is an unobservable error term, indicative of the fact that x variables 
will not explain y perfectly. Regression may be interpreted in two ways: 
as either a statistical procedure or as a technique for fitting an equation 
to data. 

The term "multiple regression" arises out of the statistical interpreta¬ 
tion. We might posit that the equation above is a "true" equation in 
nature and that while we have observed a set of y's and x's, we have 
been unable to observe the u's. Given a certain set of statistical assump¬ 
tions, multiple regression produces optimal estimates of the coefficients 
bo through b^ in the above equation. (The assumptions are fairly rea¬ 
sonable, but require more mathematics than we want to go into here.) 

Further, given these statistical assumptions, we can test hypotheses 
about the coefficients. The coefficients produced by a multiple regression 
are estimates of the true values of b. The regression also produces a 
standard error for each estimated coefficient. There is a two out of three 
chance that the true coefficient lies in a band from one standard error 
below the estimated coefficient to one standard error above the estimated 
coefficient. Chances are about 19 out of 20 that the true coefficient lies 
in a band of plus or minus two standard errors around the estimated 
coefficient. 

Suppose the true coefficient bi is zero. This is equivalent to saying that 
the variable Xj has nothing do to with explaining y. If the estimated 
coefficient is far away from zero, in the sense of being many standard 
errors away, then it's unlikely that the true coefficient is zero. The ratio 
of an estimated coefficient to it's standard error is sometimes called the 
t-statistic. If the t-statistic of bi is greater, in absolute value, than 2, there 
is only 1 chance in 20 that the variable x, has nothing to do with explaining 
the variable y. 



14 n Statistical Analysis and Program Canning 217 

(All of the statements above are predicated on what are sometimes 
called the "Gauss-Markov" assumptions. See any good statistics or 
econometrics text for a thorough discussion of the role of various math¬ 
ematical assumptions.) 

The mathematics of regression is also known as ordinary least squares. 
We can regard the problem of estimating the coefficients in the preceding 
equation as a question of fitting the equation to the data, without regard 
to any statistical assumptions. The difference between the value of y and 
the value predicted by applying our estimated coefficients to the data x 
is called a residual. Ordinary least squares pick values for the coefficients 
that minimize the sum of the squared residuals. 

In addition to the estimated coefficients, their associated standard er¬ 
rors and t-statistics, a multiple regression results in several auxiliary sta¬ 
tistics. The R-squared is a “goodness of fit measure." The R-squared is 
the percentage of variation of y explained by the variables x. R-squared 
equals 1.0 for a perfect fit and 0.0 in the absence of any fit. 

The standard error of the regression estimates the standard deviation 
of the error terms, u. The sum of squared residuals—which is just what 
it sounds like—is used in making various statistical tests. 

We wrote bo above without any associated x variable. A constant term 
bo is equivalent to the coefficient on an x variable made up of all ones, 
which is how we calculate it in the program below. A regression should 
almost always have a constant term in it, but our program lets the user 
decide whether or not to include one. 


Regression Formulas 

Computation of a multiple regression is easily specified in matrix nota¬ 
tion. Let y be a vector containing the values of y and X be a matrix where 
each column i is the values for x,. If b is the vector of estimated coefficients 
then: 

b = (X'X)-iX'y 


Remember from Chapter 9 that X' means the transpose of X. 

Let SSR stand for the sum of squared residuals, s^ for the square of 
the standard error of the regression, and R^ for R squared. If there are 
n data points and k right-hand side variables (the constant counts as one 
of the k), we have 

SSR = y'y-y'Xb 
s2 = SSR/(n-k) 

R2 = l-SSR/(y'y) 
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Canned Programs 

If you flip to the end of this chapter, you'll find a complete listing of our 
"8087 Statistical Analysis Program." You may be struck immediately by 
the fact that the program is several hundred lines long. The code is 
lengthy despite the fact that this is a "plain vanilla" program, and despite 
the fact that all the important mathematics can be specified in only a few 
lines. (The source code for a commercial, mainframe statistic's package 
would be anywhere from 10,000 to a few hundred thousand lines long.) 

Computer scientists use the name "modular programming" to describe 
the technique of breaking up a large problem into several smaller ones, 
each of which can be dealt with independently. Our canned program is 
composed of 19 "modules." The modules are classified according to whether 
they provide a user service, such as regression; a user utility, such as 
data entry; or a system service, such as program initialization. Because 
the program is broken up into small parts in this way, a new service 
could be added for the user with little or no modification to the existing 
modules. The modules in the program listing are: 

User service: 

Descriptive statistics—Module 9 
Correlation—Module 10 
Multiple Regression—Module 11 
User utilities: 

Catalog data in memory—Module 3 
Display data—Module 4 
Enter data—Module 5 
Edit data—Module 6 
Save data to disk—Module 7 
Retrieve data from disk—Module 8 
System service 

Menu display and command choice—Module 2 

Begin program execution—Module 1 

Storage allocation and program initialization—Module 12 

Program restart—Module 13 

Exit—Module 14 

Place variable name in symbol table—Module 15 
Form list of names—Module 16 
Collect product-moment matrix—Module 17 
Error-handling—Module 18 
Screen-handling—Module 19 


Data Storage 

A program can be regarded as a group of procedures acting on a set of 
data. The program modules can be regarded as communicating with one 
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another through the changes they make in the program's data base. In 
this case the data base begins with the data observed by the program 
user. Suppose the user has n observations on each of k different variables. 
We'll store the data internally in an n by k matrix called X. Observations 
on variable i are stored as the ith column of matrix X. As it is frequently 
convenient to have a constant vector, the program will automatically set 
column zero of X to equal 1. 

People think in names, not column numbers. Rather than require the 
user to number variables, we store the names of the variables, in a string 
array NAMES$, and let the computer make the connection between the 
user specified variable name and the numerical index for the appropriate 
column of X. NAMES$ can be thought of as a very simple symbol table. 
The first variable in the symbol table, "(CONST)", will always point to 
the ones in column zero. 

Several other variable definitions are also useful. TRUE% and FALSE% 
are set to -1 and 0 respectively. NUMVAR is the number of variables 
the user has defined. MAXVAR is the maximum number of variables the 
system can hold, a number which depends on the available memory. 
NUMOBS is the number of observations on a variable. (Each variable 
must have the same number of observations.) A number of modules 
communicate through the array LISTV, which contains a list of the col¬ 
umn numbers corresponding to a user specified list of variables. LISTLEN 
is the number of elements in LISTV. Finally, we adopt the convention 
that all BASIC variables beginning with a letter between I and N will be 
integer variables. All other variables are single precision unless a "%", 
"#", or "$" is appended to indicate integer, double precision, or string, 
respectively. 

Module By Module 

We undertake here a detailed, module-by-module explanation of the 
statistical analysis program. (Since each module is short, this isn't too 
difficult.) Along the way, we point out some places where the program 
could be made more flexible or more "idiot-proof," albeit at the expense 
of a lot more code. If you read through the code and the explanations 
here, you should find it easy to make your own additions or changes to 
the program. 

We'll "walk through" the program modules in the order which makes 
it easiest to understand, rather than the order in which they appear in 
the code. 


Menu Display—Module 2—Line 2000 

This module displays the available commands and asks for the user's 
choice. The response must be an integer between 1 and 11. Given any 
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other response, the program prompts for a new answer. Notice the simple 
trick to check whether a number is an integer. The program accepts any 
characters as ANSWERS. The VAL function sets ANSWER to zero if the 
response is not a number. We then set ANSW% = ANSWER. Since ANSW% 
must be an integer, the two will only be equal if ANSWER was an integer. 

Once ANSW% is in hand, the routine does an ON ANSW% GOSUB 
to the appropriate module. After the program returns, the menu display 
module starts all over again. To add another command to the program, 
we need only add a PRINT line to the menu display, add a line number 
to the GOSUB list, and change the valid answer range from 1 through 
11 to 1 through 12. 


Catalog Data in Memory—Module 3~ 

Line 3000 

This module allows the user to catalog the system's internal data base. 
The module first prints out the number of observations per variable, the 
number of variables already defined, and the number remaining still open 
for definition. 

After displaying the catalog, module 3 calls module 19, which asks the 
user to hit a key to return to the command menu. If the program didn't 
do this, the display would vanish from the screen without giving the 
user time to think. 


Display Data—Module 4—Line 4000 

This module displays the data in one or more user selected variables. 
Module 16 is called to collect the variable names from the user. Module 
16 expects certain information. MAXNAMES is the maximum number of 
names the user is permitted to enter. In this case, the user can enter as 
many as have been defined. NEWNAMES = FALSE% tells module 16 not 
to enter the names in the symbol table. FORCE0% = FALSE% tells module 
16 that it should not automatically include "(CONST)" in the list. On 
return from module 16, LISTV(O) through LISTV(LISTLEN -1) has the 
column numbers of the matrix X holding the desired data. (If NAMEERR 
is true, then module 16 found an error it couldn't handle.) 

The data display module prints up to five variables to a line, up to 20 
lines to a screen. It then pauses (using module 19) to let the user look 
at the data. The program could be fancier here in several ways. We might 
want to display the data differently according to whether the screen 
displays 40 or 80 columns across. We might also want to allow the user 
to direct output to the printer rather than the screen. Finally, we could 
pretty up the display by using more graphics. 
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Enter Data—Module 5—Line 5000 

This module allows the user to enter the data for a single variable. We 
again use module 16, though this time we only allow one name to be 
specified and we ask module 16 to place the name as a new entry in the 
s)nnbol table. Once the variable name is entered, the user is prompted 
for the data sequentially. Notice that there is no way for the user to "get 
out" of the data entry sequence except to follow it through to the end. 


Edit Data—Module 6—Line 6000 

Even a "plain vanilla" program must allow the user some way to correct 
mistakes. First, we use module 16 to ask the user which variable he or 
she wishes. Once we know the variable, we ask for an observation num¬ 
ber. After displaying the current value, the program asks the user to 
specify a new value. Then the program displays both the new and old 
value for the user. The program keeps prompting for new observation 
numbers until the user responds with the ENTER key alone. 

Save Data to Disk—Module 7—Line 7000 

Serious statistical work is rarely completed in a single sitting. Module 7 
allows the user to dump the system's database to disk for later retrieval. 
(The user also gets some protection against lost time due to power failure 
in this way.) ITie disk storage format is chosen for simplicity rather than 
efficiency. On the first line we dump out MAXVAR, NUMVAR, and 
NUMOBS. The next NUMVAR lines contain the contents of NAMES$. 
Finally, we dump the first NUMVAR columns of X. This simple format 
makes it possible to access the saved data from another program or to 
use another program to create data which can be read into our Statistical 
Analysis Program. 

What happens if the user specifies a file name that already exists? 
BASIC will merrily write over an existing file, but it would be better to 
provide the user with at least some degree of protection against inad¬ 
vertently wiping out important data. We use the following program trick 
to provide some protection. Before OPENing the output file, the program 
tries "NAME FILENAME! AS FILENAMES". This command gives a BASIC 
error message "File already exists" if FILENAME! is on the disk and 
"File not found" if FILENAME! is a new file. The error trapping module, 
18, checks to see if either of these errors occurred. In case of "File not 
found," module 18 RESUMES execution as if nothing had happened. If 
this is a duplicate file, then module 18 asks the user for confirmation 
before allowing execution to proceed. 

After storing the data on disk, the program closes the file and prints 
a message to the user before returning to the command menu. 
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Retrieve Data From Disk—Moduie 8— 

Line 8000 

The disk retrieval module complements the disk storage module. This 
module retrieves a data base that had previously been stored on a disk 
file. The module first OPENs for input the user specified disk file. (The 
error-handling module swings into operation if a non-existent file is spec¬ 
ified, so that the program won't bomb.) Once MAXVAR, NUMVAR, and 
NUMOBS are known, module 12 is called to reinitialize the program and 
allocate storage. (A more complicated program might add the contents 
of the disk file to the existing database rather than reinitializing the 
program.) Setting the flag DISKFILE% = TRUE% lets module 12 know 
that it needn't prompt the user for NUMOBS. 

Restart Program—Module 13—Line 13000 

Restarting the program is easy. Module 13 sets appropriate flags and 
calls module 12, which does all the work. 

Exit Program—Moduie 14—Line 14000 

One can always let the user hit a Ctrl-Break to end a BASIC program, 
but it's a lot more graceful to provide a specific command. Module 14 
checks with the user to be sure an exit is intended, thus preventing 
accidental loss of valuable information. Use of the END statement also 
ensures that all files have been closed properly. 

Descriptive Statistics—Moduie 9—Line 9000 

Modules 9, 10, and 11 actually do some "productive" work for the user. 
Module 9 requests a list of variable names, using module 16, and then 
prints the mean, standard deviation, and variance of each of the listed 
variables. The 8087 procedure SUM is used to collect the sum of each 
variable and the 8087 routine INPROD collects the sum of the squared 
observations for each variable. Using 8087 routines for these procedures 
is almost as efficient as writing the entire module in assembly language, 
since these are the only parts of the module whose execution time is 
proportional to the number of observations. 


Correlation—Module 10—Line 10000 

The correlation module accepts a list of variable names (via module 16 
again) and calculates the correlation coefficient between every pair of 
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names on the list. This requires the sum of Xj times Xj, for each pair, 
plus the sum of each variable. The easiest, though slightly inefficient, 
way to get the sum of a variable is to take an inner product with a vector 
of ones. Module 10 accomplishes this by setting FORCEO% =TRUE% in 
order to guarantee that "(CONST)" is the first variable in LISTV. Module 
17 is called to collect the "product-moment matrix" (the name given to 
the matrix of inner products of variable i with variable j). Once all the 
hard work is done in module 17, the correlation module prints out the 
correlation coefficients, checking as it goes along to avoid a "Division by 
zero" error. 

Multiple Regression—Module 11—Line 11000 

The multiple regression module uses module 16 twice, first to get the 
number of the dependent variable and second to get a list of numbers 
of the independent variables. The matrix X'y and the sum of y^ are formed 
using INPtoD. (Note that "X" refers here to only those columns of the 
database specified by the user in the list of independent variables.) Mod¬ 
ule 17 is called to form the matrix X'X. Module 17 only fills in the upper 
triangle of X'X, since the matrix is symmetric. We copy the upper half 
into the lower half since the matrix inversion subroutine expects to see 
the entire matrix. The 8087 routine INV is called to invert X'X. REALERR 
is used to check that the inversion routine only produced normal num¬ 
bers. Finally GINPROD is used to multiply (X'X)"^ by X'y and to form 
several auxiliary statistics. Results are then printed. As with the corre¬ 
lation module, all the hard number crunching is done by module 17. 

Begin Program Execution—Module 1— 

Line 1000 

Module 1 is quite simple. Module 12 is called after flag FIRS'l‘llME% is 
set to indicate that this is the first time through the program and flag 
DISKFILE% is set to indicate that this is not a call from module 8. The 
latter flag is logically redundant, but keeps the call to module 12 consistent 
with other parts of the program. By and large, when programming, 
consistency is worth a little redundancy. 

Some programmers prefer to place program initialization code at the 
beginning of a program. In fact, some programming languages require 
one to do so. (The IBM Personal Computer BASIC Compiler for example!) 
With the BASIC interpreter, the placement of initialization code is largely 
a matter of taste. 



224 8087 Applications and Programming 

Allocation and Program Initialization- 
Module 12—Line 12000 

This mod vile needs to consider two questions: Is this the first time the 
program has been initialized? Is this initialization preparatory to loading 
a disk file? 

Suppose that this is the first time through the program. We need to 
clear the BASIC workspace and set aside enough space to load in the 
8087 routines. Then the 8087 routines must be loaded and calling ad¬ 
dresses set. (On the book diskette, all the programs in Chapter 9 are 
grouped in a file named "VECTOR. SAV"; the programs from Chapter 
10 are in file "MATRIX.SAV"; and "MATADV.SAV" has the programs 
from Chapter 11. The addresses listed below reflect this arrangement. If 
you group your routines differently, you should change the calling ad¬ 
dresses.) Next, the program offers the user the option of loading data 
from disk. If the user invokes this option, module 8 is called. Note that 
module 8 calls back to module 12, which is perfectly legal in BASIC, 
though it is not allowed in many other programming languages. If the 
user does not choose to load data from disk, the program asks for the 
number of observations in the data. 

Next the module determines how many variables will fit in memory. 
Since the data is stored in single precision, the data itself will require 
4*n*k bytes. Space must also be set aside for the regression and correlation 
modules, and for LISTV. The amount of storage needed for NAMES$ 
will vary according to the length of variable names chosen by the user. 
Our module figures out the amount of free space by using the ERE 
function. It then figures out the maximum number of variables that will 
fit in the available space, leaving some spare room as a "fudge factor," 
and allocates storage. 

If this is not the first time through the initialization routine, then mod¬ 
ule 12 must take one of two actions, depending on whether it is acting 
as a service routine for module 8. If we are loading data from the disk, 
then NUMOBS, and so forth, is already known. Module 12 need only 
erase the old database and dimension storage afresh. If we are not loading 
data, then the job is almost the same as if this were the first time through 
the program, except that we can begin directly with asking the user for 
the number of observations. 

Module 12 is very "implementation dependent." For example, if we 
wanted to use another 8087 assembly language routine, we would have 
to change this module. While a new module might be programmed to 
load its own routines, the initialization module needs to know how much 
space to leave in the CLEAR statement. 

If we wanted to use the BASIC compiler in place of the interpreter, 
this module would have to be moved to the front of the program, because 
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the BASIC compiler requires DIM and DEF statements to appear before 
executable operations. Unfortunately, the BASIC compiler also requires 
fixed size dimensions for all the matrices, so to use the compiler we 
would be forced to create our own storage allocation mechanism. This 
would affect data storage in the entire program, not only module 12. 

Insert Name in Symbol Table—Module 15— 
Line 15000 

Module 15 attempts to place the name in NAMEIS$ in the symbol table 
and return its symbol table location in NAMELOC. Two possibilities 
might prevent completion of this task. First, the symbol table might be 
full, indicating that there is no more room in the database. Second, the 
name might already be defined. In either of these cases, module 15 prints 
an error message, sets the flag NAMEERR to TRUE%, and returns. If 
neither error arises, the module places NAMEIS$ in the first open location 
in NAMES$, adds one to the variable count in NUMVAR, and returns 
the proper value in NAMELOC. 

Collect Names From User—Module 16— 

Line 16000 

We've called this module from many other modules. Essentially, its job 
is to collect a series of names from the user and return their s)anbol table 
locations in LISTV. Module 16 treats collecting one variable and more 
than one variable as different cases, mostly so that we can give the user 
more intelligent prompts. 

In the first case, MAXNAMES equals 1. We ask the user for a name, 
and call module 15 if NEWNAMES is TRUE%. If an undefined name is 
entered improperly, the user is given the opportunity to re-enter the 
name or to give a null response. A null response, or an error, causes the 
module to return with NAMEERR set to TRUE%. When a correct name 
is given, LISTV(O) is set to the location of the name and module 16 returns 
with NAMEERR set to FALSE%. 

The problem of module 16 is considerably more complicated when a 
series of names is called for. We could prompt the user for one name at 
a time. It's friendlier to allow the user to enter a series of names separated 
by spaces. (As a side issue, the module must set LISTV(O) = 0 if FORCEO% 
requires us to include the constant term.) We accept a "variable list" from 
the user in ANSWERS. The module scans ANSWERS looking for a space. 
The substring from the beginning of the scan to the space is taken as a 
variable name. We start scanning for the next name after the space. The 
scanning process is complete when the end of the string is reached. We 
check the names one at a time either by running through NAMESS or 
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by using module 15, depending on the value of NEWNAMES. (Notice 
that in this way an entire list of variables can be entered into the symbol 
table at once, even though the program does not use this feature.) If an 
error is found in processing the list of names, the user is asked to re¬ 
enter the entire list. 

Notice that the error message for an undefined name displays the 
offending string within quotes. This is more than a nicety. Suppose the 
user enters variables named "X" and "Y” and later tries to retrieve "X 
<non-printing character>Y". The string "X 'space"' doesn't match the 
string "X". By placing the string in quotes we increase the chances that 
the user will notice the presence of a nondisplaying character. 

Module 16 could be usefully modified by putting some restrictions on 
the legal variable names. Since some other modules only print variable 
names of limited length, we might want to restrict name length at the 
time of definition. We also might want to modify this module to accept 
upper and lower case characters without distinguishing between them. 
Finally, notice that the user might well enter a string that "wraps around" 
the end of the line, which is perfectly acceptable, or a string that is longer 
than 255 characters, which will cause an error that is trapped by the error¬ 
handling module. 


Collect Product-Moment Matrix—Module 17— 
Line 17000 

Almost the entire computational time of the program is spent in this 
module. The module creates a double precision matrix named XPX#. 
Element i,j, in the upper triangle of XPX#, is set to the inner product of 
the ith and jth variables in LISTV. The 8087 routine INPROD really does 
all the work. 


Error-handling—Module 18—Line 18000 

Nothing is worse in a canned program, even a simple one like this, than 
getting a BASIC error message. The whole point of a program being 
"canned" is that the user needn't understand its innards. Our program 
doesn't offer quite this level of protection, but it does catch a few possible 
errors. For example, if the user enters too many characters in response 
to the name prompt in module 16, we'd like to allow him or her another 
shot rather than have the program die. In addition, this routine handles 
a couple of places where we induce deliberate "errors," such as in the 
specification of file names. 

Notice that we are quite careful to check the line number on which the 
error occurred before handling the error. In this way, we avoid "fixing" 
an error the program isn't prepared to handle. 
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Screen-handling—Module 19—Line 19000 

Since the computer can display text faster than we can read, it's very 
convenient to have a way to make the screen stand still. Module 19 
accomplishes this by going round in circles until the user hits a key. 


A Little More on Programming Strategy 

Our "8087 Statistical Analysis Program" is a very heavy number cruncher. 
Did you notice that of the several hundred lines of code, 8087 routines 
are referenced only nine times!?! Such a ratio is not in the least unusual 
for a general purpose program. However, these few references are re¬ 
sponsible for almost all the speed and accuracy advantage of using the 
8087. 


5 

REM 

PROGRAM FOR STATISTICAL ANALYSIS 



ID 

REM 

THE PRINCIPLE SECTIONS OF THIS PROGRAM 

BEGIN 

AT 



LINES 

: 



ED 

REM 

IDDD 

PROGRAM EXECUTION BEGINS 



3D 

REM 

EDDD 

MENU DISPLAY 



MD 

REM 

3DDD 

DATA CATALOG 



5D 

REM 

MDDD 

DATA DISPLAY 



LD 

REM 

ADDD 

DATA INPUT 



7D 

REM 

LDDD 

DATA EDITING 



AD 

REM 

7DDD 

SAVE DATA 



ID 

REM 

ADDD 

RETRIEVE DATA 



IDD 

REM 

^DDD 

DESCRIPTIVE STATISTICS 



IID 

REM 

IDDDD 

CORRELATION 



lED 

REM 

IIDDD 

MULTIPLE REGRESSION 



13D 

REM 

lEDDD 

ALLOCATE STORAGE AND INITIALIZE 

PROGRAM 

IMD 

REM 

13DDD 

RESTART PROGRAM 



lED 

REM 

mDDD 

EXIT PROGRAM 



]iLD 

REM 

15DDD 

INSERT NAME IN SYMBOL TABLE 



17D 

REM 

ILDDD 

ASK USER FOR LIST OF NAMES 



lAD 

REM 

17DDD 

COLLECT PRODUCT MOMENT MATRIX 



nD 

REM 

lADDD 

HANDLE ERRORS 



EDD 

REM 

l‘=)DDD 

HOLD SCREEN SCROLLING 



ADD 

REM 





Sin 

REM 





IDDD 

FIRSTIME5C: 

=-l 'FLAG FIRST TIME THROUGH PROGRAM AS 

TRUE 

IDID 

DISKFILEJC^ 

=D 'NOT LOADING A DISKFILE 



IDED 

GOSUB lEDDD 



EDDD 

CLS 





EDD5 

PRINT '^COMMANDS OF THE ADA7 STATISTICAL ANALYSIS 



PROGRAM" 




EDID 

PRINT "1 

CATALOG DATA IN MEMORY" 



EDED 

PRINT "E 

DISPLAY DATA" 



ED3D 

PRINT "3 

ENTER DATA" 



EDMD 

PRINT "M 

EDIT DATA" 
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205D PRINT ”5 SAVE DATA TO DISK" 

EObO PRINT "b RETRIEVE DATA FROM DISK" 

ED7D PRINT "7 MEANS-. STANDARD DEVIATIONS. AND VARIANCES" 

SOaO PRINT "a CORRELATIONS BETWEEN VARIABLES" 

EmO PRINT "T MULTIPLE REGRESSION" 

EIDO PRINT "ID RESTART PROGRAM" 

Elio PRINT "11 EXIT PROGRAM TO BASIC" 

ElED PRINT 

E13D INPUT "ENTER DESIRED SERVICE (1-11) >"^ANSIi)ER 
ElMD ANSii)^ = ANSUER 

EISD IF ANSU>:=ANSUER AND ANSbi:c>=l AND ANSlil7.<=ll THEN EEDD 
Elba PRINT "RESPONSE RE(2UIRES AN INTEGER BETWEEN 1 AND 11" 
E17D GOSUB l=iD3D:G0T0 EDOO 

EEDD ON ANSW^ GOSUB EDDD. MDDD.SDDD.bDDD.TaDD.aODa.'^IDDD. 

IDODD.llDDD.lEDDD.mODD 

E3DD REM RETURN HERE AFTER PERFORMING SERVICE 

EMDD GOTO EDDD 

3DDD REM DATA CATALOG 

3D1D CLS 

3DED PRINT "NUMBER OF OBSERVATIONS PER VARIABLE: "iNUMOBS 
3D3D PRINT "NUMBER OF DEFINED VARIABLES: "‘.NUMVAR 
3DMD PRINT "NUMBER OF REMAINING VARIABLES: "^MAXVAR-NUMVAR 
3DSD PRINT "DEFINED VARIABLES ARE:" 

3DbD FOR I=D TO NUMVAR-1:PRINT NAMES$(I):NEXT I 
3D7D GOSUB nODD 
3DaD RETURN 
MDDD REM 

MDID REM DATA DISPLAY 
^DED CLS 

M03D PRINT "DATA IN ONE OR MORE VARIABLES MAY BE DISPLAYED" 
MDMD MAXNAMES=NUMVAR:NEWNAMES=FALSE-yi:F0RCED7.=FALSE 
MDSD GOSUB lbDDD:IF NAMEERR THEN RETURN 

MDbD REM VARIABLE NUMBERS ARE IN LISTV D THROUGH LISTLEN-1 
MD7D REM PRINT M VARIABLES ON A LINE. ED OBSERVATIONS PER 
SCREEN 

MD71 IF LISTLEN=D THEN RETURN 

MDaD FIRSTVAR=D:LASTVAR=3:FIRST0B=D:LAST0B = l‘=i 

IF LASTVAR>LISTLEN-1 THEN LASTVAR=LISTLEN-1 
MIDD IF LAST0B>NUM0BS-1 THEN LAST0B=NUM0BS-1 
miD CLS 


mED PRINT "OBSERVATION 

m3D FOR I=FIRSTVAR TO LASTVAR 

MmD PRINT USING "\ \"•.NAMES$(LISTV(I))-. 

mSD NEXT I 

mbD PRINT 

m7D FOR I=FIRSTOB TO LASTOB 
maD PRINT I. 

m^D FOR J=FIRSTVAR TO LASTVAR 

MEDD PRINT X(I.LISTV(J)). 

4E1D NEXT J 

MEED PRINT 
ME3D NEXT I 
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^E^□ REM ONE SCREENFUL IS PRINTED 
^ESU GOSUB n03D 

MSTD IF LASTVAR=LISTLEN-1 AND LAST0B = NUI10BS-1 THEN RETURN 

MSaO IF LASTOB=NUnOBS-l THEN M30D 

MS*=iO FIRSTOB=LASTOB+l:LASTOB=FIRSTOB+l*=i:GOTO mOD 

M3QD REM NEXT SET OF VARIABLES 

M31D FIRSTVAR=LASTVAR+l:LASTVAR=FIRSTVAR+3 

M3aD FIRSTOB=0:LASTOB=n 

M33D GOTO MD^D 

SDDD REfl 

SDIQ REM ENTER DATA 
SDED CLS 

SD3D PRINT "ENTER NEW VARIABLE NAME" 

SDMD nAXNAnES=l:NElilNAnES=TRUE5::FORCEa5C=FALSE-/. 

5050 GOSUB lt,000:IF NAMEERR THEN RETURN 
5055 REM VARIABLE IN LISTV(O) 

50fc,0 PRINT "ENTER DATA - (<ENTER> ALONE MEANS 0)" 

5070 FOR 1=0 TO NUMOBS-1 

50a0 PRINT NAnES$(LISTV(0))i"("*.Ii") >"n 

5010 INPUT ""-.XCInLISTVCO)) 

5100 NEXT I 
5110 RETURN 
bOOO REM 

bOlO REM EDIT DATA 
bOaO CLS 

b030 PRINT "ENTER NAME OF VARIABLE TO BE EDITED" 
bOMO MAXNAMES=l:NElilNAMES=FALSE5::FORCEO">C=FALSE5l 
b050 GOSUB lt,000:IF NAMEERR THEN RETURN 
bObO REM VARIABLE IN LISTV(O) 
b070 LVAR=LISTV(0) 
bOaO CLS 

bO'lO PRINT "OBSERVATION NUMBER TO BE CHANGED"n 

blOO INPUT " <ENTER> ALONE RETURNS TO MAIN MENU >"UNSIiJER$ 

bllO IF ANSli)ER$="" THEN RETURN 

biaO ANSUER = VAL(ANSUER^):ANSlii:c=ANSlilER 

bl30 IF ANSlilER=ANSU>: AND ANSIi)V.>=0 AND ANSIil’>'<NUMOBS THEN 
blbO 

bmo PRINT "OBSERVATION MUST BE INTEGER BETWEEN 0 AND 
"^NUMOBS-1 
bl50 GOTO bO'=iO 

blbO PRINT NAMES$(LVAR)n"( "ANSUERi") = ";X(ANSlil>:•,LVAR)n 

bl70 INPUT "NEW VALUEf>"-,ANSIiIER 

blflO PRINT NAMES$(LVAR)^"( "ANSWER^") WAS "^X(ANSW/.aVAR)n 

bnO PRINT " IS NOW "^ANSWER 

baOO X(ANSW^.iLVAR)=ANSWER 

baiO GOTO bO'^O 

7000 REM 

7010 REM SAVE DATA ON DISK FILE 

70a0 REM FIRST LINE HAS MAXVAR-,NUMVAR-,NUMOBS 

7030 REM THEN THE VARIABLE NAMES IN ORDER 

70M0 REM THEN THE DATA IN EACH VARIABLE IN ORDER 

7050 CLS 
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7Dt,a INPUT "ENTER DISK FILE NA[1E> "-.FILENAflE^ 

7QbS NAME FILENAMES AS FILENAMES 

7D70 OPEN FILENAMES FOR OUTPUT AS #1 

7DflD URITE#l-,riAXVARiNUI1VAR-,NUI10BS 

7D1D FOR I=D TO NUflVAR-l: WRITE#!-,NAI1ES$(I): NEXT I 

71DD FOR I=D TO NUilVAR-l 

711D FOR J=0 TO NUnOBS-1 

7120 WRITE#lnX(J-.I) 

7130 NEXT Jil 

7ma CLOSE #1 

71SD PRINT "DATA FILED IN "^FILENAHE^ 

71b0 60SUB nDQO:RETURN 

fiQQO REM 

aOlD REM RETRIEVE DATA FROM DISK FILE 
flOBO CLS 

fl03D INPUT "ENTER DISK FILE NAnE> "-.FILENAMES 
flOMO OPEN FILENAMES FOR INPUT AS #1 
flOSD INPUT#!-.I1AXVAR-,NUI1VARnNUI10BS 

flobo diskfile:i=true:i 

ao^o 60SUB 12D00 

aiOD FOR 1=0 TO NUnVAR-!:INPUT#!-.NAI1ES$(I):NEXT I 

ailO FOR 1=0 TO NUMVAR-! 

aiBO FOR J=0 TO NUnOBS-! 

a!30 INPUT#!-.X(JiI) 

amo NEXT j-,1 

also CLOSE #1 

aibO PRINT "DATA RETRIEVED FROM "^FILENAI1E$ 
a!70 60SUB nOOOiRETURN 
*1000 REM 

■lOlO REM PRINT MEANS-. STANDARD DEVIATIONS-. VARIANCES 
^020 CLS 

MAXNAMES=NUMVAR:NEWNAMES=FALSE-/C:F0RCE05:=FALSE:c 
^OSO GOSUB !bOOO:IF NAMEERR THEN RETURN 

lObO REM VARIABLE NUMBERS ARE IN LISTV 0 THROUGH LISTLEN-1 
“1070 PRINT "VARIABLE"-." MEAN "-."STANDARD DEVIATION"-. 
"VARIANCE" 

^0^0 REM COLLECT SUM OF EACH VARIABLE IN SUM# 

^100 REM COLLECT SUM-S(3UARE OF EACH VARIABLE IN SUMS(2# 

'lllO FOR 1 = 0 TO LISTLEN-1 
^112 LI = LISTV(I) 

1120 SUM#=0:CALL SUM:c(X(0-.LI)-.NUM0BS-.SUM#) 

“1130 SUMS(3#=0:CALL INPR0D5'(X(0-.LI)-.X(0-,LI)iSUMS(3#-.NUM0BS) 
imO AVERAGE#=SUM#/NUMOBS 

*=1150 VARIANCE#=(SUMS(2#-SUM#*SUM#/NUM0BS)/(NUM0BS-1) 

=ilbO PRINT NAMES$(LI)-.AVERAGE#-.S(2R(VARIANCE#)-.VARIANCE# 

*1170 NEXT I 
‘liaO GOSUB 1*1000 
^no RETURN 
10000 REM 

10010 REM PRINT CORRELATIONS 
10020 CLS 

lOOMO MAXNAMES=NUMVAR:NEUNAMES=FALSE5c:FORCEO;c=TRUE7. 
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IDDSD GOSUB IbDDDrIF NAMEERR THEN RETURN 

IDDbD REM VARIABLE NUMBERS ARE IN LISTV □ THROUGH LISTLEN-1 
IDDflO CORRERR$= ”\ \ \ \ WHOOPS CONSTANT 

VARIABLE" 

lOmU PRINT "VARIABLE-1 VARIABLE-E CORRELATION COEFFICIENT" 
lOlDD REM HAVE PRODUCT-MOMENT MATRIX COLLECTED IN XPX# 

IDllD GOSUB 17QDD 
IDiaO FOR 1=1 TO LISTLEN-1 
1D13D NM1$ = NAMES$(LISTV(I)) 

lOmO FOR J = I TO LISTLEN-1 

IDISD NMa$ = NAMES$(LISTV(J)) 

IQltiO COV#=NUMOBS*XPX#(I-,J)-XPX#(0-,I)*XPX#(DnJ) 

1D17D Vl#=NUMOBS*XPX#(InI)-XPX#(D-,I)*XPX#(OnI) 

IDlflD Va#=NUMOBS*XPX#(J-. J)-XPX#(0-, J)*XPX#(D-, J) 

lono IF (V1#*VE#)<>Q THEN IDEED 

lOEDD PRINT USING CORRERR^^ NM1$-,NME$ 

IDEID GOTO 10E3D 

IDEED PRINT NM1$-.NME$-,C0V#/S(3R(V1#*VE#) 

1DE3D NEXT J-,1 
1□E^□ GOSUB nOOD 
IDESD RETURN 
IIDOD REM 

IIDID REM MULTIPLE REGRESSION SECTION 
IIDED REM FIRST GET DEPENDENT VARIABLE 
11D3D REM THEN INDEPENDENT VARIABLES 
HOMO REM THEN GO TO WORK 
llOSD CLS 

llDbD PRINT "MULTIPLE REGRESSION" 

11D7D PRINT "ENTER DEPENDENT VARIABLE" 
llOflO MAXNAMES = 1:NEUNAMES = FALSE:c:FORCEO’/.=FALSE’/C 
IIQ^D GOSUB lbDOD:IF NAMEERR THEN RETURN 
lllDD REM DEPENDENT VARIABLE IN LISTV(D) 
llllQ DEPVAR:c=LISTV(0) 

HIED PRINT "ENTER INDEPENDENT VARIABLES" 

1113D MAXNAMES=NUMVAR:NEWNAMES=FALSE-/':FORCED:c=FALSE"/' 
lima GOSUB ltiDDD:IF NAMEERR THEN RETURN 

lllSD REM VARIABLE NUMBERS ARE IN LISTV □ THROUGH LISTLEN-1 

lllbD IF LISTLEN>=NUMOBS THEN llbDD 

1117D REM ALLOCATE REGRESSION STORAGE 

lllflD ERASE XPY#-.XPXINV#-.BETA#-.SCRATCH-.INDEX 

lino L1 = LISTLEN-1 

llEDD DIM XPY#(Ll)nXPXINV#(Ll-.Ll)-.BETA#(Ll)-,SCRATCH(Ll)-.INDEX(Ll) 

IIEID REM NOW DO THE REGRESSION 

HEED YS(3R#=D:CALL INPROD’/C(X(D-.DEPVAR"/.)-. 

X(D-.DEPVAR-/')-.YS(3R#-,NUMOBS) 

11E3D FOR I=D TO LISTLEN-1 

llEMD CALL INPROD>'(X(D-.DEPVAR^)-.X(D-,LISTV(I))-.XPY#(I)-,NUMOBS) 

llESD NEXT I 

HELD GOSUB 1700D ’COLLECT XPX# - UPPER HALF 
llEti3 FOR 1=0 TO LISTLEN-1 
HELM FOR J = I + 1 TO LISTLEN-1 
llEbS XPX#(J-.I)=XPX#(Ii J) 
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112bb NEXT J-,1 

11E7D IER=D:TYPEX'>l = fl:LB=LISTLEN*LISTLEN 

IIE&O CALL INV“/'(XPX#(D-.D)-.XPXINV#(D-.D)-.SCRATCH(D)-,INDEX(0)-,IER-. 

TYPEX*>C-.LISTLEN) 
llB=iD IF lERoD THEN 117DD 

112^2 IFDEN = FALSE-/.:IFINF = FALSE^:IFNAN = FALSE-/.:ELEnENT“>C = D 
112TM CALL REALERR>:(XPXINV#(D-,0)-,TYPEX’><nL2-.IFDENiIFINF-. 

IFNAN-, ELEMENT’/C) 

112=it, IF (NOT IFDEN) AND (NOT IFINF) AND (NOT IFNAN) THEN 11300 
112*=ia PRINT "WARNING NUMERICAL RESULTS HIGHLY SUSPECT" 

11300 REM NOW FORM XPXINV# TIMES XPY# 

11310 I0NE=1 

11320 FOR 1=0 TO LISTLEN-1 

11330 CALL GINPR0D:c(XPXINV#(In0)-.XPY#(0)-,BETA#(I)-.TYPEX5Ci 

TYPEX->^nLISTLEN-,IONEiLISTLEN) 

113M0 NEXT I 

113S0 REM NOW FORM SUM S(3UARE RESIDUALS AS Y’Y-BETA'X’Y 
113b0 TEMP#=D:CALL GINPR0D’/'(BETA#(0)-,XPY#(0)-.TEMP#-, 

TYPEX^CnTYPEX^CnlONEiIONE-. 

LISTLEN) 

11370 SSR#=YS(2R#-TEMP# 

11375 IF YS(3R#=0 THEN PRINT "ZERO LHS VARIABLESff": GOTO 
11510 
113fl0 CLS 

IIB'IO S2#=SSR#/(NUM0BS-LISTLEN) 

limO PRINT "VARIABLE"-,"COEFFICIENT"-."S.E."" "T-STATISTIC" 
im30 FOR 1=0 TO LISTLEN-1 
imMO SE#=S(3R(S2#*XPXINV#(IiI)) 

im50 IF SE#<>0 THEN PRINT NAMES$(LISTV(I))-,CSNG(BETA#(I))-, 
CSNG(SE#)-.CSNG(BETA#(I)/SE#) 

ELSE PRINT NAMES$(LISTV(I))-.CSNG(BETA#(I))-, 
imt.0 NEXT I 

im70 PRINT NUMOBS^" OBSERVATIONS "^LISTLEN^" VARIABLES" 
imao PRINT "STANDARD ERROR OF REGRESSION= "nS(3R(S2#) 
im^O PRINT "SUM SQUARE RESIDUALS= "iSSR# 

11500 PRINT "R-S(3UARED= "n l-SSR#/YS(aR# 

11510 GOSUB 1^^000 
11520 RETURN 

llbOO PRINT "MORE OBSERVATIONS THAN DEPENDENT VARIABLES 
REQUIRED" 

llblO GOSUB nOOO:RETURN 

11700 PRINT "NUMERICALLY SINGULAR MATRIX" 

11710 PRINT "EITHER VARIABLE INCLUDED TWICE OR TOO MUCH "i 
11720 PRINT "MULTICOLLINEARITY" 

11730 GOSUB nOOO:RETURN 
12000 REM 

12010 REM ALLOCATE STORAGE AND INITIALIZE PROGRAM 
12020 IF NOT FIRSTIME^C THEN 12^20 

12030 CLEAR -.&H7F00 *SET ASIDE SPACE IF YOU HAVE LESS THAN 

120M0 DEFINT I-N 

120M5 ON ERROR GOTO laOOO 
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12DS0 TRUE>: = -l:FALSE^=D 

lEDbO FIRSTinE-/' = FALSE‘>':(i)ASFIRST5C = TRUE5' 

1E070 DEF SEG=&HEF0 ’SUBROUTINE AREA 

1ED7S VECT0R:c = Q:nATRIX>'=&HMED:nATADV:c=&H7^Q 

lEDflO BLOAD "VECTOR.SAV"iVECTOR^:BLOAI) "MATRIX-SAV"-. 

matrix:' 

lEQfiS BLOAO "MATADV.SAV"-,MATADVV. 

lErnO INPR0D:'=MATRIX:' + &H1E1:GINPR0I>-/'=MATRIX:'+&H1AI3 
lEmi INV/. = MATADV:'+&HMF3:SUM"/C=VECT0RV. + D: 

realerr:'=vector:'+&hei)c 

lED^S CLS 

lElDD PRINT "DO YOU WISH TO LOAD DATA FROM A DISK FILE (Y/ 
N)>"^ 

lEllD INPUT ""-.ANSWERS 

lElED IF ANSUER$="Y" OR ANSUER$="y" THEN GOSUB flDOQiGOTO 
lEMlQ 

1E13D INPUT "NUMBER OF OBSERVATIONS>"ANSlilER$ 
lElMO IF ANSUER$="" THEN RETURN 
lElSD ANSUER = VAL(ANSIilER$) 
lElbO ANSIil“/C = ANSlijER 

1E17G IF (ANSlij:'=ANSUER) AND (ANSlil^'oQ) THEN lEElD 
lElfiD PRINT "POSITIVE INTEGER RE(3UIRED"i 
lEnO PRINT "<ENTER> RETURNS TO COMMAND MENU" 
lEEOO GOTO lElOQ 
lEElD NUMOBS = ANSU:: 

lEEED REM FOR K VARIABLES NEED ABOUT 
1EE3Q REM MNK FOR X 

lEEMD REM lfc,K*K FOR XPXnXPXINV 

lEESD REM AT LEAST IbK FOR NAMES 
lEEbD REM ILK FOR XPY AND BETA 

1EE7D REM BETTER LEAVE A LITTLE EXTRA FOR SAFETY-. SAY 
lEEfiD REM USE E□□□ + K(^N + lbK + 3E) 
lEEflS ERASE X-.NAMES$ 
lEE=ia SPACE = FRE(0)-EODD 

1E3D0 K=INT(SPACE/NUMOBS/^):Kl=S(3R(SPACE/lb) 

1E30S IF K>K1 THEN K=K1 ’FIRST ESTIMATE FOR K 

1E31D IF (M*K*(NUMOBS + M*K+fl))>SPACE THEN K=K-l:GOTO 1E31D 

1E3ED MAXVAR=K 

1E330 IF MAXVAR>1 THEN lE3t.D 
1E3MD PRINT "TOO MANY OBSERVATIONS" 

1E350 GO TO 1E13D 

lE3t.D REM NOTE THAT BASIC INITIALIZES EVERYTHING TO ZERO 
1E370 Kl=K“l:Nl=NUMOBS-l:NBYK=NUMOBS*MAXVAR 
lE3flO ERASE NAMES$-.X 
lE3fi5 DIM NAMES$(K1)-.X(N1-,K1) 

1E3^D NAMES$(D)="(CONST)" 

1E3^S FOR I=Q TO N1:X(I-.Q)=1.□:NEXT I 
lEMOO NUMVAR=1 

lEMlD IF NOT UASFIRST:: THEN RETURN 

lEmS GOTO EDDO ’FAKE RETURN-. GOSUB WIPED OUT BY CLEAR 
lEMEO REM NOT THE FIRST TIME INITIALIZED 
1EM30 ERASE X-.NAMES$ 



234 8087 Applications and Programming 


IBMMQ IF NOT DISKFILE’/C THEN ISmS 

lEMSQ Kl=f1AXVAR-l:Nl=NUI10BS-l:NBYK=NUn0BS*nAXVAR 

lEMtD ERASE NAI1ES$-,X 

lEMbS DIM NAI1ES$(K1).X(N1-.K1) 

1BM70 RETURN 
13DDD REM 

13010 REM RESTART PROGRAfl 

13030 FIRSTiriE>'=FALSE5C:I>ISKFILE:c=FALSE:^ 

13030 GOSUB 13000 
130^0 RETURN 
mOOO REM 

molo REM EXIT PROGRAM 

m030 INPUT "ARE YOU SURE YOU WANT TO EXIT (Y/N)>"-.ANSWERS 
m030 IF ANSIilER$="y" OR ANSlilER$ = "Y" THEN END 
mOMO RETURN 
15000 REM 

15010 REM INSERT NAME IN SYMBOL TABLE 
15030 REM NAME TO BE INSERTED IS IN NAMEIS$ 

15030 REM IF SYMBOL TABLE IS FULL-. PRINT MESSAGE AND SET 
NAMEERR 

150^0 REM IF NOT A NEW NAME-. PRINT MESSAGE AND SET NAMEERR 

15050 REM OTHERWISE PUT NAMEIS$ IN NEXT LOCATION IN NAMES$ 

150fe,0 REM REPORT IT'S POSITION IN NAMELOC 

15070 IF NUMVAR<MAXVAR THEN 15100 

150fi0 PRINT "SYMBOL TABLE FULL !!!! NO NEW VARIABLES" 

ISO^O NAMEERR=TRUE’/::RETURN 
15100 FOUNDIT-/i=FALSE:^ 

15110 FOR 1=0 TO NUMVAR-1 

15130 IF NAMEIS$=NAMES$(I) THEN F0UNDIT^=TRUE7. 

15130 NEXT I 

15m0 IF NOT F0UNDIT-/1 THEN 15170 
15150 PRINT CHR$(3M)nNAMEIS$nCHR$(3M)i 

" ALREADY DEFINED - NOT A NEW NAME" 

15155 GOSUB nOOO 

i5it.o nameerr=true:c:return 

15170 NAMELOC=NUMVAR 
151fl0 NUMVAR=NUMVAR+1 
15n0 NAMES$(NAMELOC)=NAMEIS^ 

15300 RETURN 
lt.000 REM 

IbOlO REM COLLECT A LIST OF NAMES AND RETURN LOCATIONS IN 
LISTV 

lb030 REM A SINGLE NAME IS A SPECIAL CASE 
lt,03D nameerr = false:' 

IbOMQ IF MAXNAMES>1 THEN IbSOO 

lt.050 INPUT "VARIABLE NAME ISf>"-.NAMEIS$ 

IbObO IF NAMEIS$="" THEN NAMEERR = TRUE7.: RETURN 
lt,070 IF NOT NElilNAMES THEN IblOO 
lt>OfiQ GOSUB 15000 

lt.0^0 IF NAMEERR THEN RETURN ELSE IblflO 

IblOO NAMEL0C=-1 

IbllO FOR 1=0 TO NUMVAR-1 
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IblSD IF NAI1ES$(I)=NAnEIS$ THEN NAnEL0C = I 
lbl3D NEXT I 

ItamO IF NAMELOCo-l THEN IblflD 

IblSD PRINT CHR$(34);NAnEIS$iCHR$(3M)i” NOT DEFINED” 

IblbD PRINT "RE-ENTER NAME OR <ENTER> TO RETURN TO COflMAND 
MENU” 

IbITQ GOTO ItOSD 

IblflO REM PUT NAMELOC IN LISTV 

lt,nO LISTLEN = 1 

IbEDD LISTV(0)=NAnEL0C 

IbSlD RETURN 

IfciSDO REM COME HERE TO COLLECT A SERIES OF VARIABLES 
ItSOl REM IF FORCED): THEN INCLUDE CONSTANT AUTOMATICALLY 
IbSOa IF NOT FORCED): THEN LISTLEN = D ELSE LISTV(D)=D: LISTLEN = 1 
IbSlD INPUT "ENTER VARIABLE NAnE(S) SEPARATED BY A SPACE>"n 
ANSUER$ 

ILSSD IF ANSUER$ = "" THEN NAnEERR = TRUE):: RETURN 
lbS3D FOR I=LISTLEN TO MAXNAMES-l:LISTV$(I)="":NEXT I 
IbSMD LOOKFROM=l 

IbSSD REM RETRIEVE A VARIABLE NAME 
ILSbD SPACELOC): = INSTR(LOOKFROM-,ANSlilER$-," ") 

IbSTD IF SPACELOC):=D THEN SPACELOC):=LEN(ANSIilER$)+l 
IbSflD NAMEIS$=MID$(ANSUER$-,LOOKFROM-.SPACELOC):-LOOKFROM) 

IbS^D NAMELOC=-l 

IbbDD IF NAMEIS$="" THEN lb73D 

ILLID IF NOT NEUNAMES THEN 1LL3D 

IbtED GOSUB ISODD: IF NAMEERR THEN RETURN 

lbb3D FOR I=D TO NUMVAR-1 

ILLMD IF NAMES$(I)=NAMEIS$ THEN NAMELOC=I 

ILLED NEXT I 

ILLLD IF NAMELOCo-1 THEN lb7DD 

lbb7D PRINT CHR$(3M)-.NAMEIS$;CHR$(3m^" NOT DEFINED" 

IbtflD PRINT "RE-ENTER LIST OR <ENTER> TO RETURN TO COMMAND 
MENU" 

Ibb'lD GOTO IbSDD 

lt,7DD REM PUT NAMELOC IN LISTV 

1L71D LISTV(LISTLEN)=NAMELOC 

IbTSD LISTLEN=LISTLEN+1 

lb73D LOOKFROM=SPACELOC): + l 

lt7MD IF LOOKFROM>LEN(ANSUER$) THEN RETURN 

lb75D IF LISTLEN<MAXNAMES THEN IbSbD 

lL7bD PRINT "TOO MANY NAMES" 

lb77D GO TO ILLAD 

17DDD REM 

17D1D REM COLLECT PRODUCT MOMENT MATRIX IN UPPER HALF OF 
XPX# 

17DSD ERASE XPX#: L1=LISTLEN-1: DIM XPX#(L1-,L1) 

17D3D FOR I=D TO LISTLEN-1 

17DMD FOR J=I TO LISTLEN-1 

17D5D CALL INPROD):(X(D-.LISTV(I))nX(D-.LISTV(J))-. 

XPX#(In J)-,NUMOBS) 

17DtiD NEXT Jil 
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17070 RETURN 
IfiOOO REM 

IfiOlO REM HANDLE A FEU ERRORS HERE 

IflOaO REM DID UE RUN OUT OF SPACEf 

lfl030 IF ERR<>7 AND ERRolM THEN 18070 

laOMO PRINT "PROGRAM RAN OUT OF MEMORY IN LINE ";ERL 

IflOSO PRINT "SORRY. . ." 

IflObO STOP 

18070 REM DID UE TRY TO READ FROM A NON-EXISTENT FILEf 
18080 IF ERROS3 OR ERL<>80M0 THEN 18130 
180*10 IF FILENAME$ = "" THEN RESUME 8170 ’BACK TO MENU 
18100 PRINT "CAN’T FIND "nFILENAME$ 

18110 RESUME 8030 ’TRY AGAIN 
18130 REM IS THIS A NEU OUTPUT FILEf 
18130 IF ERROS3 OR ERL<>70bS THEN 18150 
18m0 RESUME NEXT 

18150 IF ERR058 OR ERL<>70fci5 THEN 18300 

181b0 PRINT "FILE ALREADY EXISTS-. ARE YOU SUREf (Y/N)"i 

18170 INPUT ""-,ANSUER$ 

18180 IF ANSUER$="y" OR ANSUER$="Y" THEN RESUME NEXT 
18n0 RESUME 70b0 

18300 IF ERR<>5 OR (ERL<>13385 AND ERL<>la^30 AND ERL<>11180 
AND ERLO17030 AND ERL<>13Mt,0 AND ERL<>13380) THEN 
18330 

18310 RESUME NEXT ’0K-. UE JUST ERASED SOMETHING THAT UASNT 
THERE 

18330 ON ERROR GOTO 0 
18330 END 

nOOO REM HOLD SCREEN 

nOlO PRINT "HIT ANY KEY TO RETURN TO COMMAND MENU>"; 
noao IF INKEY$="" THEN 11030 ELSE RETURN 
11030 PRINT "HIT ANY KEY TO RETURN TO CONTINUE>"t 
110H0 IF INKEY$="" THEN 110^0 ELSE RETURN 



Commercial Data 
Processing 


The name "numeric data processor" naturally leads people to think of 
the 8087 as a tool for "scientific" rather than "business" applications. 
While the 8087's forte is certainly working with numbers, it does have 
important applications in business and commercial EDP (Electronic Data 
Processing). 



The Cookbook—Chapter 15 

Program: 

ADDSTR 

Purpose: 

Add array of integer-valued strings. 

Call: 

CALL ADDSTR(A$(0),ISPACE(0),SUM,IER,N) 

Input: 

A$—N element string array. 

ISP ACE—5 element integer array; scratch space. 

N—integer number of elements of A$. 

Output: 

SUM—single precision scalar; sum of VAL(A$(I)) 
lER—integer; -1 if error, 0 otherwise. 

Language: 

8087/8088 assembly language. 


The 8087 is valuable in any application involving numbers. In the last 
chapter, we built a small statistical package out of the matrix routines of 
Chapters 10 and 11. Business people normally don't care about technical 
aspects of matrix inversion! However, mathematical tools such as regres¬ 
sion analysis (which use matrix operations internally) are a regular part 
of the forecasting and planning function in every large company. The 
8087 is an important tool for anyone building software for business people 
to use. 

Typical commercial EDP applications (payroll programs and the like) 
do relatively little numerical computation. Such programs spend more 
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time converting data from an external "ASCII" format to an internal 
binary format than they spend manipulating the numbers after the con¬ 
version. For this reason, commercial programs often avoid conversion 
costs by operating directly on data stored in decimal, rather than binary, 
representation. The 8087 supports such operations through its packed 
decimal instructions. 

Almost all commercial data processing applications are written in high- 
level languages. Languages such as COBOL and PL/I allow you to operate 
on decimal data. The BASIC language offered on personal computers 
rarely provides a decimal data type. In order to show off the 8087's 
prowess at decimal operations, we've written a small assembly language 
routine that replaces part of a BASIC program. 

Consider the following BASIC program which creates a string array 
filled with integers and then totals up the values in the strings. 

10 DEFINT I-N 
BO DIM 
3D N=^^*=n 

MO REfl FILL UP A$ WITH INTEGERS 
SD FOR I=D TO N:A$(I)=STR$(I):NEXT I 
LD REM TINE THIS PART 
7D T1$=TII1E$ 

flO sun=D 

^0 FOR I=D TO N 
IQD SUI1=SUn+VAL(A$(I)) 

IID NEXT I 
IBO TB$=TinE$ 

13D PRINT N+l-,SUn-.Tl$-.TB$ 

IMD END 

Most of the work in lines 90, 100, and 110 is in the function "VAL" which 
converts strings to single precision. (If you change the array of strings, 
A$, to a single precision array. A, you'll see the program's speed nearly 
triple.) Assembly language subroutine ADDSTR, below, adds up a vector 
of strings (representing integers) and returns a single precision sum. We 
can replace lines 90-100 with ADDSTR, as in the following program. 

ID DEFINT I-N 

ED Din A$(M=m)-.ISPACE(M) 

3D N=M^=n 

MD REn FILL UP A$ WITH INTEGERS 
SD FOR I = D TO N:A$(I)=STR$(I):NEXT I 
t.D REn TinE THIS PART 
7D Tl$=TinE$ 

SD sun=D 

*=ID IER=|^ 

IDD call ADDSTR(A$(D)-,ISPACE(D)nSUn-,IER-.N) 

IID Ta$=TinE$ 

IBD PRINT N + l-.SUn-,Tl$-,TB$ 

13D END 
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ADDSTR processes each string in three steps. First, it finds the string 
by untangling the string descriptor provided by BASIC. Second, ADDSTR 
converts the string's ASCII representation to packed decimal while doing 
some limited error checking. Third, ADDSTR uses the 8087 packed dec¬ 
imal instructions to add up the converted values. 


.SUBROUTINE Al)I>STR(A$(D)-.ISPACE(D)-,SUn.IER.N) 
i ASSUMPTIONS: A$ - N LONG ARRAY OF STRINGS 

^ ISPACE - ID FREE BYTES 

sun - SINGLE PRECISION ANSWER 

i lER - INTEGER. 0 ON RETURN FOR NO ERROR 

-1 IF ERROR 

^ N “ INTEGER NUMBER OF ELEMENTS OF A$ 


. 


ELEMENTS OF A$ ARE 

ASSUMED TO BE 



INTEGERS NO MORE 

TAHN Ifl BYTES LONG- 

. 

ADDSTR 

ADDS UP 

VALUES IN A$ 


. 

PUBLIC 

ADDSTR 


CSEG 

SEGMENT ’CODE* 



ASSUME 

CS.-CSEG 


ADDSTR 

PROC 

FAR 



PUSH 

BP 



MOV 

BP.SP 



MOV 

BX.[BP]+fl 

^BX = ADDR(IER) 


MOV 

WORD PTR [BX].D 

^ASSUME NO ERROR 


MOV 

BX.[BP]+fc, 

^KEEP COUNT OF ARRAY 


MOV 

CX.[BX] 

UN CX AS USUAL 


FLDZ 

CMP 

a 

r 

X 

•.CLEAR OUT STACK TOP 


JG 

NOTDONE 

^N = Df 


JMP 

DONE 


NOTDONE: 

MOV 

BX.[BP]+m 

^BX=ADDR(A$(D)) 

•.NOTICE 

BX KEEPS 

TRACK OF THE DESCRIPTORS OF THE STRINGS. 

. 

NOT THE STRINGS THEMSELVES 



GET«I>ESCRIPTOR: 

^UNFORTUNATELY. THE BASIC COMPILER AND THE BASIC INTERPRETER 
^STORE STRINGS DIFFERENTLY 

COMPILER DESCRIPTOR HAS THE STRING LENGTH IN ONE WORD 
^ FOLLOWED BY THE STRING ADDRESS IN A SECOND WORD 

^THE INTERPRETER DESCRIPTOR HAS THE STRING LENGTH IN ONE 
BYTE 

^ FOLLOWED BY THE STRING ADDRESS IN A WORD 
^ASSUME THIS PROGRAM IS RUN WITH COMPILED BASIC 


. 

MOV 

MOV 

AX.WORD 
AH.D 

PTR 

[BX] 

ASSUME COMPILER 

. 

MOV 

AL.BYTE 

PTR 

[BX] 

aF INTERPRETER 
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i AX IS NUMBER OF BYTES IN STRING 
iCLEAR OUT WORKSPACE 


MOV 

SI-.[BP]+ia 

^SI=ADDR(ISPACE) 

MOV 

lilORD PTR [SI].D 


nov 

WORD PTR [SI]+a-.D 


MOV 

WORD PTR [SIl+MiO 


MOV 

WORD PTR [SIl+biD 


MOV 

WORD PTR [SI]+fl-.D 


n 

MOV 

DI-.WORD PTR [BX]+E 

iDI=ADDR(STRING(I)) 

1 nov 

DltWORD PTR [BX]+1 

n IF INTERPRETER 

iCHECK FIRST CHARACTER FOR MINUS SIGN 

MOV 

DL-.BYTE PTR [DI] 

nDL IS FIRST CHARACTER 

CMP 

DL-.MS 

nCHECK FOR MINUS 

JNE 

NUMBER 

^SIGN 

MT'S NEGATIVE 

OR 

BYTE PTR [SI]-.flDH 

nSET SIGN BIT 

DEC 

AX 

•lUSED UP ONE BYTE 

n CHECK STRING 

i LENGTH 


CMP 

AXnO 

•iNULL STRING NOT 

JLE 

ERROR 

•lALLOWED 

CMP 

AX-ilfl 


JG 

ERROR 


NUMBER: 

^NOlil START AT 

RIGHT END OF STRING AND WORK BACKWARD 

ADD 

DI.AX 

:,DI POINTS TO 

DEC 

DI 

•iLAST BYTE OF STRING 

CMP 

DL-iMS 

•.BUT TEST IF WE HAD 

JNE 

Ll 

•.ALREADY SUBTRACTED 

INC 

DI 


Ll: 

^lilE NEED TO REMEMBER WHETHER TO PLACE DIGIT IN 

^EFT OR RIGHT 

NIBBLE (HALF OF BYTE) 


tKeep flag in 

DH-, □ MEANS RIGHT 1 MEANS LEFT 

MOV 

DH-.0 


nNObl TRANSLATE 

EACH CHARACTER 


NEXTNUM: 

MOV 

DL-.BYTE PTR [DI] 

•.GET CHARACTER 

CMP 

m 

m 

r 

as IT A SPACEf 

JNE 

NOT_A_SPACE 


MOV 

DLnMa 

aF SO. MAKE IT ZERO 

NOT-A-SPACE: 

CMP 

DLiMfl 

n<Of 

JL 

ERROR 


CMP 

DL-.S7 


JG 

ERROR 


SUB 

DLiMA 

•.MAKE D-T 

CMP 

DH-.D 

•.RIGHT NIBBLES 

JNE 

LEFT 
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STOUIT: 

OR 

BYTE PTR[SI]*.DL 

nSTORE DECinAL 


XOR 

DHil 

•.SWITCH NIBBLE 


jnp 

NEXTCH 


LEFT: 

SHL 

DL-,1 

•-.GET IT TO LEFT 


SHL 

DLnl 

^NIBBLE 


SHL 

DL-.1 



SHL 

DL-.1 



OR 

BYTE PTR[SI]-.DL 

;ST0RE DECinAL 


XOR 

DH-,1 

^SWITCH NIBBLE 


INC 

SI 

iNEXT BYTE 

NEXTCH: 

DEC 

DI 

^NEXT CHARACTER 


DEC 

AX 

;done yet 


JG 

NEXTNUn 

^nOREf 

nNOU ISPACE HAS 

A NICE PACKED DECIMAL NUMBER IN IT 


nov 

SI-,[BP]+1S 

iPOINT TO ISPACE 
•.AGAIN 


FBLD 

[SI] 

;PUSH IT ONTO STACK 


FADDP 

STCDiST 

•.ADD INTO TOTAL 


ADD 

BXiM 

:.NEXT ARRAY ELENENT 


ADD 

BXi3 

INTERPRETER 


LOOP 

G0T0_GET_DESCRIPT0R 


DONE: 

nov 

SI-.[BP]+10 

;SI = ADDR(SUn) 


FSTP 

DWORD PTR [SI] 

^STORE AWAY SUn 


POP 

BP 



FUAIT 

RET 

10 


G0T0_GET. 

-DESCRIPTOR: JNP GET_DESCRIPTOR 

ERROR: 

nov 

BX-.[BP]+fl 

;BX=ADDR(IER) 


nov 

WORD PTR [BX]-.-l 

TERROR INDICATOR 


jnp 

DONE 


ADDSTR 

ENDP 



CSEG 

ENDS 

END 




Notice how we provide the scratch space that ADDSTR needs to store 
the packed decimal value. We could have set up a 10-byte area in an 
extra segment, as we have in other programs. Instead, we get BASIC to 
pass us a 10-byte array called ISPACE. (This was mostly just as an excuse 
to show an alternative technique for finding storage for an assembly 
language program.) 

Table 15-1 provides some timing figures with and without ADDSTR. 

Routine ADDSTR took over 100 lines of assembly language code to 
replace three lines of BASIC. In return for the extra work, we got a 
program that runs 50 times faster than interpreted basic and 12 times 
faster than compiled BASIC. In this example, the speed improvement 
for a commercial application is the same as we found for scientific ap¬ 
plications earlier in the book. 

While the 8087 may never become quite so indispensable in commercial 
work as it is in scientific programming, we can still expect its use to 
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Table 15-1. Speed benchmarks for packed decimal instructions 
(time in seconds). 

Program 

Add 5,000 integer 


strings 

BASIC interpreter 

64 

BASIC compiler 

15 

8087 routine 

1.25 


become widespread, especially as 8087-compatible translators for com- 
merical programming languages appear. 





Postscript 


I told you a little fib in the first chapter. 1 said you would use the 8087 
to "turn minutes into seconds." The 8087 will indeed turn minutes into 
seconds, but 1 think you will find that the 8087's real value lies in its 
ability to extend your reach. Now that your machine is many times faster, 
you will find you will want to solve problems that are many times larger— 
and probably problems with more important answers. Solutions that 
could formerly be found only on a large computer—or that weren't avail¬ 
able to you at all—are now within your grasp. 
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Appendix 1 


Table Al-1. Instruction Set Reference Data. Courtesy of Intel 
Corporation. 


FABS FABS (no operands) 

Absolute value Exceptions: 1 

Operands 

Execution Clocks 

Transfers 

Coding Example 

Typical 

Range 

8086 

8088 

(no operands) 

14 

10-17 

0 

0 

FABS 


FADD 

FADD //source/destination,source 

Add real 

Exceptions: 1, D, 0, U, P 

Operands 

Execution Clocks 

Transfers 

Coding Exampie 

Typical 

Range 

8086 

8088 

//ST,ST{i)/ST(i),ST 

85 

70-100 

0 

0 

FADD ST,ST(4) 

short-real 

105+EA 

90-120+EA 

2/4 

4 

FADD AIR_TEMP(SI) 

long-real 

110+EA 

95-125+EA 

4/6 

8 

FADD (BXl.MEAN 


FAD DP FAOOP destination,source _ 

Add real and pop Exceptions: 1, D, 0, U, P 

Operands 

Execution Clocks 

Transfers 

Coding Example 

Typical 

Range 

8086 

8088 

ST(i),ST 

90 

75-105 

0 

0 

FADDP ST(2).ST 


FBLD FBLD source 

Packed decimal (BCD) load Exceptions: 1 

Operands 

Execution Clocks 

Transfers 

Coding Exampie 

Typical 

Range 

6066 

6066 

packed-decimal 

300+EA 

290-310+EA 

5/7 

10 

FBLD YTD_SALES 


FBSTP FBSTP destination . 

Packed decimal (BCD) store and pop Exceptions. 

Operands 

Execution Clocks 

Transfers 

Coding Example 

Typical 

Range 

6086 

6068 

packed-decimal 

530+E A 

520-540+EA 

6/8 

12 

FBSTP (BXl.FORECAST 


FCHS FCHS (nooperands) 

Change Sign Exceptions: 1 

Operands 

Execution Clocks 

Transfers 

Coding Example 

TypicaJ 

Range 

8086 

6088 

(no operands) 

15 

10-17 

0 

0 

FCHS 
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Table Al-1. Instruction set reference data (continued). Courtesy of 
Intel Corporation. 


FCLEX/FNCLEX 


) 

Operands 


(no operands) 


FCLEX (no operands) 
Clear exceptions 


Execution Clocks 


Exceptions: None 


Coding Example 



FCOM 


/;sT(i) 

Short-real 

long-real 


FCOM //source 
Compare real 


Execution Clocks 



Typical 

Range 

8086 

45 

40-50 

0 

65+EA 

60-70+EA 

2/4 

70+EA 

65-75+EA 

. 4/6 



Exceptions: I, D 


Coding Example 


FCOM ST(1) 

FCOM (BP).UPPER_ LIMIT 
FCOM WAVELENGTH 


FCOMP 

FCOMP //source 
Compare real and pop 



Exceptions: 1, D 


Execution Clocks 

Transfers 


Operands 

Typical 

Range 

8086 

8088 

Coding Example 

//ST(i) 

47 

42-52 

0 

0 

FCOMP ST(2) 

short-real 

68+EA 

63-73+EA 

2/4 

4 

FCOMP [BP + 2).N_READINGS 

long-real 

72+EA 

67-77+EA 

4/6 

8 

FCOMP DENSITY 


FCOMPP 


Operands 


(no operands) 


FCOMPP (no operands) 
Compare real and pop twice 


Execution Clocks 


Typical 


Exceptions: I. D 


Transfers 

8086 

8088 

0 

0 




Coding Example 


FDECSTP 


Operands 


(no operands) 


FDECSTP (no operands) 
Decrement stack pointer 


Execution Clocks 


Exceptions: None 


Coding Example 



FDISI/FNDISI 


FDISI (no operands) 
Disable interrupts 


Exceptions: None 


Operands 

Execution Clocks 

Transfers 

Coding Example 

Typical 

Range 

8086 

8088 

(no operands) 

5 

2-8 

0 

0 

FDISI 
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Table Al-1. Instruction set reference data (continued). Courtesy of 
Intel Corporation. 


FDIV 



//ST{i),ST 

Short-real 

long-real 


FDIV //source/destinatlon,source 
Divide real 


Execution Clocks 


Typical 

Range 

8086 

198 

193-203 

0 

220+EA 

215-225+EA 

2/4 

225+EA 

220-230+EA 

4/6 



Exceptions: I. D. Z. O, U, P 


Coding Example 


FDIV DISTANCE 


FDIVP 


FDIVP destination,source 
Divide real and pop 



Exceptions: I. D. Z, 0, U, P 


Coding Example 


FDIVP ST(4).ST 


FDIVR 


FDIVR //source/destination,source 
Divide real reversed 


Execution Clocks 


Exceptions: I, D, Z, O. U, P 


Operands 

Typical 

Range 

8086 

8088 

Coding Example 

//ST,ST(i)/ST(l).ST 

199 

194-204 

0 

0 

FDIVR ST(2),ST 

short-real 

221+EA 

216-226+EA 

2/4 

6 

FDIVR [BXJ.PULSE_RATE 

long-real 

226+EA 

221-231+EA 

4/6 

8 

FDIVR RECORDER.FREQUENCY 


FDIVRP 


FDIVRP destination,source 
Divide real reversed and pop 



Exceptions: I. D. Z, O, U, P 


Coding Example 


FDIVRP ST(1),ST 


FENI/FNENI 


Operands 


(no operands) 


FENI (no operands) 
Enable interrupts 


Execution Clocks 


Exceptions: None 


Coding Example 



FFREE 


FFREE destination 
Free register 


Exceptions: None 


Operands 

Execution Clocks 

Transfers 

Coding Example 

Typical 

Range 

8086 

8088 

ST(i) 

11 

9-16 

B 

0 

FFREE ST(1) 
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Table Al-1. Instruction set reference data (continued). Courtesy of 
Intel Corporation. 


FIADD 


word-integer 

short-integer 


FICOM 


word-integer 

short-integer 


FICOMP 


word-integer 

short-integer 


FIDIV 


word-integer 

short-integer 


FIDIVR 


word-integer 

short-integer 


FILD 


word-integer 

short-integer 

long-integer 


FIADD source 
Integer add 


Execution Clocks 



Range I 8086 


120+EA 102-137+EA 1/2 
125+EA 108-143+EA 2/4 


Exceptions: I, D, 0, P 


Coding Example 


FIADD DISTANCE_TRAVELLED 
FIADD PULSE_COUNT(SI) 


FICOM source 
integer compare 


Execution Clocks 



Range 8086 8088 


72-86+EA 1/2 

78-91+EA 2/4 


Exceptions: I, D 


Coding Example 


FICOM TOOL.N_.PASSES 
FICOM 1BP + 4|.PARM ..COUNT 



FICOMP source 
Integer compare and pop 


Execution Clocks 


Range 8086 


74-88VeA 1/2 

80-93+EA 2/4 


Exceptions: 1,0 


Coding Example 


FICOMP IBPl.LIMITISI) 
FICOMP N SAMPLES 


FIDIV source 
Integer divide 


Execution Clocks 


Exceptions: I. 0, Z, O. U. P 



Range 8086 8088 


230+EA 224-238+EA 1/2 
236+EA 230-243+EA 2/4 


Coding Example 


FIDIV SURVEY.OBSERVATIONS 
FIDIV RELATIVE ANGLE (DIJ 


FIDIVR source 
Integer divide reversed 


Execution Clocks 


Exceptions: I. D, Z, O. U. P 



Range 8086 8088 


230+EA 225-239+EA 1/2 
237+EA 231-245+EA 2/4 


Coding Example 


FIDIVR lBPj.X_COORD 
FIDIVR FREQUENCY 


FILD source 
integer load 



Execution Clocks 

Typical 

Range 

50+EA 

56+EA 

64+EA 

46-54+EA 

52-60+EA 

60-68+EA 



Exception: I 


Coding Example 


FILD [BXl.SEQUENCE 
FILD STANDOFF (Dl] 

8 I FILD RESPONSE.COUNT 
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Table Al-1. Instruction set reference data (continued). Courtesy of 
Intel Corporation. 


FIMUL 

FIMUL source 

Integer multiply 



Exceptions: I, 0,0, P 

Operands 

Execution Clocks 

Transfers 

Coding Example 

Typical 

Range 

8086 

8088 

word-integer 

130+EA 

124-138+EA 

1/2 

B 

FIMUL BEARING 

short-integer 

136+EA 

130-144+EA 

2/4 

B 

FIMUL POSITION.Z_AXIS 


FINCSTP FINCSTP (nooperands) 

increment stack pointer Exceptions: None 

Operands 

Execution Clocks 

Transfers 

Coding Example 

Typical 

Range 

8086 

8088 

(no operands) 

9 

6-12 

B 

0 

FINCSTP 


FINIT/FNINIT FINIT (nooperands) _ 

Exceptions: None 

Initialize processor 

Operands 

Execution Clocks 

Transfers 

Coding Example 

Typical 

Range 

8086 

8088 

(no operands) 

5 


0 

0 

FINIT 


FIST 

FIST destination 

Integer store 



Exceptions: 1, P 

Operands 

Execution Clocks 

Transfers 

Coding Example 

Typical 

Range 

8086 

8088 

word-integer 

86+EA 

80-90+EA 

2/4 

4 

FIST OBS.COUNT(SI) 

short-integer 

88+EA 

82-92+EA 

3/5 

6 

FIST (BP].FACTORED_PULSES 


FISTP 

FISTP destination 

Integer store and pop 



Exceptions: 1, P 

Operands 

Execution Clocks 

Transfers 

Coding Example 

Typical 

Range 

8086 

8088 

word-integer 

88+EA 

82-92+EA 

2/4 

4 

FISTP [BX].ALPHA_COUNT[SI] 

short-integer 

90+EA 

84-94+EA 

3/5 

6 

FISTP CORRECTED_TIME 

long-integer 

100+EA 

94-105+EA 

5/7 

10 

FISTP PANEL.N_READINGS 


FISUB 

FISUB source 

Integer subtract 



Exceptions: 1. D, 0. P 

Operands 

Execution Clocks 

Transfers 

Coding Example 

Typical 

Range 

8086 

8088 

word-integer 

120+EA 

102-137+EA 

1/2 

B 

FISUB BASE_FREQUENCY 

short-integer 

125+EA 

108-143+EA 

2/4 

B 

FISUB TRAIN_.SIZE (Dll 
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Table Al-1. Instruction set reference data (continued). Courtesy of 
lintel Corporation. 


FISUBR 

FISUBR source 





Integer subtract reversed 



Exceptions: I, D. 0. P 

Operands 

Execution Clocks 

Transfers 

Coding Example 

Typical 

Range 

8086 

8088 

word-integer 

120+EA 

103-139+EA 

1/2 

n 

FISUBR FLOOR [BX] (Sll 

short-integer 

125+EA 

109-144+EA 

2/4 

B 

FISUBR BALANCE 


FLD 

FLD source 

Load real 



Exceptions: 1, D 

Operands 

Execution Clocks 

Transfers 

Coding Example 

Typical 

Range 

8086 

8088 

ST(i) 

20 

17-22 

0 

0 

FLD ST(0) 

short-real 

43+EA 

38-56+E A 

2/4 

mm 

FLD READING [Sll.PRESSURE 

long-real 

46+EA 

40-60+E A 


n 

FLD [BPj.TEMPERATURE 

temp-real 

57+EA 

53-65+EA 


10 

FLD SAVEREADING 


FLDCW FLDCW source 

Load control word Exceptions: None 

Operands 

Execution Clocks 

Transfers 

Coding Example 

Typical 

Range 

8086 

8088 

• 2-bytes 

10+EA 

7-14+EA 

1/2 

2 

FLDCW CONTROL WORD 


FLDENV 

FLDENV source 

Load environment 



Exceptions: None 

Operands 

Execution Clocks 

Transfers 

Coding Example 

Typical 

Range 

8086 

8088 

14-bytes 

40+EA 

35-45+EA 

7/9 

14 

FLDENV (BP + 61 


FLDLG2 FLDLG2 (nooperands) 

Load log,„ 2 Exceptions: 1 

Operands 

Execution Clocks 

Transfers 

Coding Example 

Typical 

Range 

8086 

8088 

(no operands) 

21 

18-24 

0 

0 

FLDLG2 


FLDLN2 

FLDLN2 (no operands) 
Load log^ 2 



Exceptions: 1 

Operands 

Execution Clocks 

Transfers 

Coding Example 

Typical 

Range 

8086 

8088 

(no operands) 

20 

17-23 

0 

0 

FLDLN2 
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Table Al-1. Instruction set reference data (continued). Courtesy of 
Intel Corporation. 


FLDL2E 


Operands 


(no operands) 


FLDL2E (no operands) 
Load 1092 ® 


Execution Clocks 


Exceptions: I 


Coding Example 



FLDL2T 


Operands 


(no operands) 


FLDL2T (no operands) 
Load 109210 


Execution Clocks 


Range 8086 8088 


Exceptions: I 


Coding Example 



FLDPI 


Operands 


(no operands) 


FLDPI (no operands) 
Load n 


Execution Clocks 


Typical Range 8086 8088 


Exceptions: I 


Coding Example 


FLDZ 


Operands 


(no operands) 


FLDZ (no operands) 
Load +0.0 


Execution Clocks 



Range 8086 8088 


11-17 


Exceptions: I 


Coding Example 


FLD1 


FMUL 


//ST(l).ST/ST.ST(i)' 

//ST(i).ST/ST,ST(i) 

short-real 

long-real' 

long-real 


FLD1 (no operands) 
Load +1.0 


Exceptions: I 


Operands 

Execution Clocks 

Transfers 

Coding Example 

Typical 

Range 

8086 

8088 

(no operands) 

18 

15-21 

0 

_ 

0 

FLD1 



FMUL //source/destination.source 
Multiply real 


Execution Clocks 


Range 8086 8088 


Exceptions: 1,0,0, U.P 


Coding Example 


97 90-105 0 

138 130-145 0 

118+EA 110-125+EA 2/4 

120+EA 112-126+EA 4/6 

161+EA 154-168+EA 4/6 


0 FMUL ST.ST(3) 

0 FMUL ST.ST(3) 

4 FMUL SPEED_FACTOR 
8 FMUL (BP).HEIGHT 
8 FMUL (BPl.HEIGHT 


occurs when one or both operands is '‘short”—it has 40 trailing zeros in its fraction (e.g., it was loaded from 
a short-real memory operand). 
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Table Al-1. Instruction set reference data (continued). Courtesy of 
Intel Corporation. 


FMULP 

FMULP destination.source 
Multiply real and pop 

Exceptions: l,D,0, U.P 

Operands 

Execution Clocks 

Transfers 

Coding Example 

Typical 

Range 

8086 

8088 

ST(i).ST 

100 

94-108 

0 

0 

FMULP ST(1).ST 

ST(i),ST 

142 

134-148 

0 

0 

_ 

FMULP ST(1).ST 

occurs when one or both operands is “short' 
a short-real memory operand). 

—it has 40 trailing zeros in i 

s fraction (e.g.. it was loaded from 



FNOP 


Operands 


(no operands) 


FNOP (no operands) 
No operation 


Execution Clocks 


Exceptions; None 



Range 8086 8088 


Coding Example 


FPATAN 

FPATAN (no operands) 
Partial arctangent 



Exceptions: U. P 
(operands not checked) 


Execution Clocks 

Transfers 


Operands 

Typical 

Range 

8086 

8088 

Coding Example 

(no operands) 

650 

250-800 

0 

0 

FPATAN 



FPREM 


Operands 


(no operands) 


FPREM (no operands) 
Partial remainder 


Execution Clocks 


Typical Range 8086 8088 


Exceptions: I. D. U 


Coding Example 


00 FPREM 


FPTAN 


Operands 


(no operands) 


FPTAN (no operands) 
Partial tangent 


Execution Clocks 


Typical 


Exceptions: I. P 
(operands not checked) 


Transfers 


8088 


Coding Example 


FRNDINT 


Operands 


(no operands) 


FRNDINT (nooperands) 
Round to integer 


Execution Clocks 


Typical Range 8086 8088 


Exceptions: I. P 


Coding Example 
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Table Al-1. Instruction set reference data (continued). Courtesy of 
Intel Corporation. 


FRSTOR 

FRSTOR source 

Restore saved state 



Exceptions: None 

Operands 

Execution Clocks 

Transfers 

Coding Example 

Typical 

Range 

8086 

8088 

94-bytes 

210--EA 

205-215-EA 

47'49 

96 

FRSTOR IBP] 


FSAVE/FNSAVE 

FSAVE destination 

Save state 



Exceptions: None 

Operands 

Execution Clocks 

Transfers 

Coding Example 

Typical 

Range 

8086 

8088 

94-bytes 

< 

LU 

4 

o 

CM 

ma 

48 50 

94 

FSAVE IBP) 


FSCALE FSCALE (no operands) _ , _ 

Scale Exceptions: 1,0. U 

Operands 

Execution Clocks 

Transfers 

Coding Example 

Typical 

Range 

8086 

8088 

(no operands) 

35 

32-38 

0 

0 

FSCALE 


FSQRT FSQRT no operands) _ o 

_ . Exceptions: 1. D. P 

Square root 

Operands 

Execution Clocks 

Transfers 

Coding Example 

Typical 

Range 

8086 

8088 

(no operands) 

183 

180-186 

0 

0 

FSQRT 


FST 

FST destination 

Store real 



Exceptions: 1.0. U.P 

Operands 

Execution Clocks 

Transfers 

Coding Example 

Typical 

Range 

8086 

8088 

ST(i) 

18 

15-22 

0 

0 

FST ST(3) 

short-real 

87-hEA 

84-90+EA 

3/5 

6 

FST CORRELATION (Dlj 

long-real 

100-^EA 

96-104+E A 

5/7 

10 

FST MEAN READING 


FSTCW/FNSTCW FSTCW destination ... 

store control word Exceptions: None 

Operands 

Execution Clocks 

Transfers 

Coding Example 

Typical 

Range 

8086 

8088 

2-bytes 

15-i-EA 

12-18-rEA 

2/4 

4 

FSTCW SAVE CONTROL 
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Table Al-l. Instruction set reference data (continued). Courtesy of 
Intel Corporation. 


FSTENV/FNSTENV FSTENV destination 

store environment Exceptions: None 

Operands 

Execution Clocks 

Transfers 

Coding Example 

Typical 

Range 

8086 

8088 

14-bytes 

45+EA 

40-50+EA 

8/10 

16 

FSTENV (BP) 


FSTP 

FSTP destination 

Store real and pop 



Exceptions: 1.0. U.P 

Operands 

Execution Clocks 

Transfers 

Coding Example 

Typical 

Range 

8086 

8088 

ST{i) 

20 

17-24 

0 

0 

FSTP ST(2) 

short-real 

89+EA 

86-92+EA 

3/5 

6 

FSTP (BXj.ADJUSTED RPM 

long-real 

102+EA 

98-106+E A 

5/7 

10 

FSTP TOTAL_.DOSAGE 

temp-real 

55+EA 

52-58+EA 

6/8 

12 

FSTP REG_SAVE(SI1 


FSTSW/FNSTSW FSTSW destination , . „ 

store status word Exceptions: None 

Operands 

Execution Clocks 

Transfers 

Coding Example 

Typical 

Range 

8086 

8088 

2-bytes 

15+EA 

12-18+EA 

2/4 

4 

FSTSW SAVE_STATUS 


FSUB 

FSUB //source/destination.source 
Subtract real 

Exceptions: 1.0.0.U,P 

Operands 

Execution Clocks 

Transfers 

Coding Example 

Typical 

Range 

8086 

8088 

//ST.ST(i)/ST(i),ST 

85 

70-100 

0 

0 

FSUB ST,ST(2) 

short-real 

105+EA 

90-120+EA 

2/4 

4 

FSUB BASE_VALUE 

long-real 

110+EA 

95-125+EA 

4/6 

8 

FSUBCOORDINATE.X 


FSUBP 

FSUBP destination,source 
Subtract real and pop 


Exceptions: 1.0.0,U,P 

Operands 

Execution Clocks 

Transfers 

Coding Example 

Typical 

Range 

8086 

8088 

ST(i),ST 

90 

75-105 

B 

B 

FSUBP ST(2),ST 


FSUBR 

FSUBR //source/destination,source 
Subtract real reversed 

Exceptions: l,D,O.U.P 

Operands 

Execution Clocks 

Transfers 

Coding Example 

Typical 

Range 

8086 

8088 

//ST,ST(l)/ST(i),ST 

87 

70-100 

Bi 

0 

FSUBR ST,ST(1) 

short-real 

105+EA 

90-120+EA 


4 

FSUBR VECTOR[Slj 

long-real 

110+EA 

95-125+EA 

Bi 

8 

FSUBR (BX],INDEX 































































































254 8087 Applications and Programming 


Table Al-1. Instruction set reference data (continued). Courtesy of 
Intel Corporation. 


FSUBRP 

FSUBRP destination,source 
Subtract real reversed and pop 


Exceptions: i,0,O,U,P 

Operands 

Executon Clocks 

Transfers 

Coding Example 

Typical 

Range 

8086 

8088 

ST(i).ST 

90 

75-105 

0 

B 

FSUBRP ST(1),ST 


FTST 

FTST (no operands) 

Test stack top against +0.0 


Exceptions: I, D 

Operands 

Execution Clocks 

Transfers 

Coding Example 

Typical 

Range 

8086 

6088 

(no operands) 

42 

38-48 

0 

0 

FTST 


FWAIT 

FWAIT (no operands) 

(CPU) Wait while 8087 is busy 


Exceptions: None (CPU instruction) 

Operands 

Execution Clocks 

Transfers 

Coding Example 

Typical 

Range 

8086 

8088 

(no operands) 

3+5n* 

3+5n* 

B 

0 

FWAIT 


FXAM FXAM (no operands) 

■ Examine stack top Exceptions: None 

Operands 

Execution Clocks 

Transfers 

Coding Example 

Typical 

Range 

8086 

8088 

(no operands) 

17 

12-23 

0 

0 

FXAM 


FXCH 

FXCH //destination 
Exchange registers 



Exceptions: 1 

Operands 

Execution Clocks 

Transfers 

Coding Example 

Typical 

Range 

8086 

8088 

//ST(i) 

12 

10-15 

0 

0 

FXCH ST{2) 


EXTRACT 

EXTRACT (no operands) 

Extract exponent and significand 


Exceptions: 1 

Operands 

Execution Clocks 

Transfers 

Coding Example 

Typical 

Range 

8086 

8088 

(no operands) 

50 

27-55 

0 

0 

EXTRACT 
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Table Al-1. Instruction set reference data (continued). Courtesy of 
Intel Corporation. 


FYL2X 

FYL2X (no operands) 
Y-LoQjX 



Exceptions: 

P (operands not checked) 

Operands 

Execution Clocks 

Transfers 

Coding Example 

Typical 

Range 

8086 

8088 

(no operands) 

950 

900-1100 

0 

0 

FYL2X 


FYL2XP1 

FYL2XP1 (no operands) 
V»log2(X + 1) 



Exceptions: 

P (operands not checked) 

Operands 

Execution Clocks 

Transfers 

Coding Exampie 

Typical 

Range 

8086 

8088 

(no operands) 

850 

700-1000 

0 

0 

FYL2XP1 


F2XM1 

F2XM1 (no operands) 
2M 



Exceptions: 

U, P (operands not checked) 

Operands 

Execution Clocks 

Transfers 

Coding Example 

Typical 

Range 

8086 

8088 

(no operands) 

500 

310-630 

0 

0 

F2XM1 
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Table A2-1. Exception conditions and masked responses. 
Courtesy of Intel Corporation. 


Condition 

Masked Response 

1 Invalid Operation | 

Source register is tagged empty (usually 
due to stack underflow). 

Return real indefinite . 

Destination register is not tagged empty 
(usually due to stack overflow). 

Return real indefinite (overwrite 
destination value). 

One or both operands is a NAN. 

Return NAN with larger absolute value 
(ignore signs). 

(Compare and test operations only): 
one or both operands is a NAN. 

Set condition codes “not comparable”. 

(Addition operations only): closure is 
affine and operands are opposite-signed 
infinities; or closure Is projective and both 
operands are » (signs Immaterial). 

Return real indefinite 

(Subtraction operations only): closure is 
affine and operands are like-signed 
Infinities; or closure is projective and both 
operands are »(signs immaterial). 

Return real indefinite. 

(Multiplication operations only): <» * 0; or 

0 * 00. 

Return real indefinite. 

(Division operations only): oo -f »; or 0 -r 0; 
or 0 -r pseudo-zero; or divisor is denormal 
or unnormal. 

Return real indefinite. 

(FPREM instruction only): modulus 
(divisor) is unnormal or denormal; 
or dividend is oo. 

Return real indefinite, set condition code 
= “complete remainder”. 

(FSQRT instruction only): operand is 
nonzero and negative; or operand is 
denormal or unnormal; or closure Is affine 
and operand is -«>; or closure Is projective 
and operand is «. 

Return real indefinite. 
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Table A2-1. Exception conditions and masked responses (continued). 
Courtesy of Intel Corporation. 


Invalid Operation 

(Compare operations only); closure Is 
projective and «» Is being compared with 0 
ora normal, oroo. 

Set condition code = “not domparable” 

(FIST Instruction only): closure Is 
projective and operand Is «>. 

Set condition code = “not comparable”. 

(FIST, FISTP instructions only): source 
register Is empty, or a NAN, or denormal, 
or unnormal, or », or exceeds represent¬ 
able range of destination. 

Store Integer/ncfe^/n/fe. 

(FBSTP instruction only): source register 

Is empty, or a NAN, or denormal, or 
unnormal, or », or exceeds 18 decimal 
digits. 

Store packed decimal indefinite. 

(FST, FSTP instructions only): destination 

Is short or long real and source register Is 
an unnormal with exponent in range. 

Store real indefinite. 

(FXCH Instruction only): one or both 
registers Is tagged empty. 

Change empty reglster(s) to real indefinite 
and then perform exchange. 

Denormalized Operand 

(FLD instruction only): source operand is 
denormal. 

No special action; load as usual. 

(Arithmetic operations only): one or both 
operands Is denormal. 

Convert (In a work area) the operand to the 
equivalent unnormal and proceed. 

(Compare and test operations only): one 
or both operands is denormal or unnormal 
(other than pseudo-zero). 

Convert (in a work area) any denormal to 
the equivalent unnormal; normalize as 
much as possible, and proceed with 
operation. 

Zerodivide 

(Division operations only): divisor = 0. 

Return <» signed with “exclusive or” of 
operand signs. 

Overflow 

(Arithmetic operations only): rounding Is 
nearest or chop, and exponent of true 
result >16,383. 

Return properly signed ooand signal 
precision exception. 

(FST, FSTP instructions only): rounding is 
nearest or chop, and exponent of true 
result > +127 (short real destination) 
or > +1023 (long real destination). 

Return properly signed «> and signal 
precision exception. 
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Table A2-1. Exception conditions and masked responses (continued). 
Courtesy of Intel Corporation. 


Underflow 

(Arithmetic operations only): exponent of 
true result <-16,382 (true). 

Denormallze until exponent rises to 
-16,382 (true), round significand to 64 bits. 

If denormalized rounded significand = 0, 
then return true 0; else, return denormal 
(tag == special, biased exponent =0). 

(FST, FSTP instructions only): destination 
is short real and exponent of true result 
<-126 (true). 

Denormallze until exponent rises to -126 
(true), round significand to 24 bits, store 
true 0 if denormalized rounded significand 
= 0; else, store denormal (biased expo¬ 
nent =0). 

(FST, FSTP instructions only): destination 
is long real and exponent of true result 
<-1022 (true). 

Denormallze until exponent rises to -1022 
(true), round significand to 53 bits, store 
true 0 if rounded denormalized significand 
= 0; else, store denormal (biased expo¬ 
nent-0). 

Precision 

True rounding error occurs. 

No special action. 

Masked response to overflow exception 

No special action. 


earlier in instruction. 
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Four of the programs below convert data back and forth between the 
Intel format used in the 8087 and the Microsoft format used in much pre- 
8087 software. Two programs, SM2I and DM2I, convert from Microsoft 
to Intel; two, SI2M and DI2M, convert from Intel to Microsoft. Two 
programs, SM2I and SI2M, convert single precision data; two, DM2I and 
DI2M, convert double precision data. Occasional minor loss of precision 
in the conversion process is unavoidable. 

The fifth program, INIT8087, initializes the 8087. 


Program; 

Purpose: 

Call: 

Input: 

Output: 

Language: 

Program: 

Purpose: 

Call: 

Input: 

Output: 

Language: 

Program: 

Purpose: 

Call: 


The Cookbook—Appendix 3 


SM2I 

Convert single precision vector from pre-8087 Mi¬ 
crosoft format to Intel 8087 format. 
CALLSM2I(SOURCE(0),DESTINATION(0),N). 
SOURCE—single precision n-vector. 

N—integer number of elements in SOURCE. 
DESTINATION—single precision N-vector. 

8088 assembly language. 

SI2M 

Convert single precision vector from Intel 8087 for¬ 
mat to pre-8087 Microsoft format. 

CALL SI2M(SOURCE(0),DESTINATION(0),N). 
SOURCE—single precision n-vector. 

N—integer number of elements in SOURCE. 
DESTINATION—single precision N-vector. 

8088 assembly language. 

DM2I 

Convert double precision vector from pre-8087 Mi¬ 
crosoft format to Intel 8087 format. 
CALLDM2I(SOURCE(0),DESTINATION(0),N). 
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Input: SOURCE—double precision n-vector. 

N—integer number of elements in SOURCE. 
Output: DESTINATION—double precision N-vector. 

Language: 8088 assembly language. 

Program: DI2M 

Purpose: Convert double precision vector from Intel 8087 for¬ 

mat to pre-8087 Microsoft format. 

Call: CALL DI2M(SOURCE(0),DESTINATION(0),N). 

Input: SOURCE—double precision n-vector. 

N—integer number of elements in SOURCE. 
Output: DESTINATION—double precision N-vector. 

Language: 8088 assembly language. 

Program: INIT8087 

Purpose: Initialize 8087. 

Call: CALL INIT8087. 

Input: none. 

Output: none. 

Language: 8087/8088 assembly language. 


If you use a version of BASIC which does not store data in Intel format, 
you must use conversion routines before and after calling 8087 routines. 
The following BASIC code provides an example. 

10 DEFINT I-N 
SO DEFDBL » 

30 N=100:N1=N-1 

MO Din A(N1) 

SO FOR 1=0 TO Nl:A(I)=RND:NEXT I 
iiO CALL INITflOa? 

70 CALL Sn2I(A(0)-.A(D)-,N) 

80 CALL SUn(A(0)-.N-.DSUn) 

ID CALL SISn(A{0)iA(0)-iN) 

100 Il=l:CALL Disn{Dsun-,Dsun-.ii) 

110 PRINT "THE sun IS"-.DSUn 
ISO END 


Conversion Routines 


PUBLIC snsi-,sisn-.Dnsi,Disn-,iNiTao87 

ESEG SEGnENT 'DATA' 

WS D(i) M DUP(f>) 

ESEG ENDS 

CSEG SEGnENT ’CODE' 

FIRST-INST EflU THIS WORD 
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n SUBROUTINE Snai(SOURCE-,I)ESTINATION-.N) 


^CONVERT 

MICROSOFT TO 

INTEL 



ASSUME 

CS:CSEG*,ES:ESEG 


snai 

PROC 

FAR 



PUSH 

BP 



MOV 

BP-.SP 


T 

;SET UP EXTRA SEGMENT 

TAKING CARE OF RELOCATION 


PUSH 

ES 



CALL 

NEXTSl 


NEXTSl: 

POP 

AX 



SUB 

AX-,(OFFSET NEXTSl) 

-(OFFSET FIRST_INST) 


MOV 

CLiM 



SHR 

AX-.CL 



MOV 

BX-iCS 



ADD 

BX-iESEG 



SUB 

BXnCSEG 



ADD 

AX-.BX 



MOV 

ES-.AX 


^ROUTINE 

PROPER STARTS HERE 



MOV 

BX-.[BP]+b 

;ADDR(N) 


MOV 

CX-,[BX] 

;CX = N 


JCXZ 

OUT 



MOV 

SI-.[BP]+1D 



MOV 

DI-,[BPI+fi 


snaiLoop 

: MOV 

AX-.[SI] 

iCOPY SOURCE WORD 1 


MOV 

[DI]iAX 



MOV 

DX-.[SI]+a 

^MOVE WORD E INTO 




DX 


MOV 

AH-.DL 

;get sign bit 


AND 

AHiflOH 



SUB 

DH-.(lE=i-ia7) 



JBE 

Z1 

•nCHECK FOR ZERO OR 




CLOSE 


SHR 

DH.l 



JC 

SETl 



AND 

DL-.7FH 

tBIT 7 OFF 


JMP 

Ll 


SETl: 

OR 

DL-.aDH 

iBIT 7 ON 

Ll: 

AND 

DH-.7FH 

^SET SIGN 


OR 

DHnAH 

nBIT 


MOV 

[DI]+a-.DX 

^STUFF ANSWER AWAY 


JMP 

LOOPBOTl 


Z15 

MOV 

WORD PTR [DI1-.0 

^MAKE IT ZERO 


MOV 

WORD PTR 




[DI]+2-.D 


LOOPBOTl 

: ADD 

SInM 



ADD 

DlnM 



LOOP 

SMEILOOP 
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OUT: 

POP 

ES 



POP 

BP 



RET 

b 


snai 

ENDP 



; SUBROUTINE SIBn(SOURCE-,])ESTINATION-,N) 


•.CONVERT 

INTEL TO MICROSOFT 



ASSUME 

CS:CSEG.ES:ESEG 


Sian 

PROC 

FAR 



PUSH 

BP 



MOV 

BP.SP 


•.SET UP 1 

EXTRA SEGMENT 

TAKING CARE OF RELOCATION 


PUSH 

ES 



CALL 

NEXTSa 


NEXTSa: 

POP 

AX 



SUB 

AX.(OFFSET NEXTSa)-(OFFSET FIRST_INST) 


MOV 

CL.M 



SHR 

AX.CL 



MOV 

BX.CS 



ADO 

BX.ESEG 



SUB 

BX.CSEG 



ADD 

X 

CD 

r 

X 

•< 



MOV 

ES.AX 


. 

^ROUTINE 

PROPER STARTS 

HERE 



MOV 

BX.[BP]+b 

•.ADDR(N] 


MOV 

CX.[BX] 

;CX = N 


JCXZ 

OUTa 



MOV 

SI.[BP]+1D 



MOV 

DI.[BPI+fl 


sianLoop 

: MOV 

AX.[SI] 

;COPY SOURCE WORD 1 


MOV 

[DIj.AX 



MOV 

DX.[SI]+a 

^UORD a INTO DX 


MOV 

AH.DH 

^GET SIGN BIT 


AND 

AH.flDH 



SHL 

DH.l 



TEST 

DL.flOH 

•.LOOK AT LSE BIT 


JZ 

La 



OR 

DH.l 

^SET LSE 

La: 

CMP 

DH.D 

^CHECK FOR TRUE 




ZERO 


JE 

za 



ADD 

DH.(iai-ia7) 



AND 

DL.7FH 

•.BIT 7 OFF 


OR 

DL.AH 

^SET SIGN BIT 


MOV 

[DI]+a.DX 



JMP 

LOOPBOTa 


za: 

MOV 

UORD PTR [DIj.D 

•.SET TO ZERO 


MOV 

WORD PTR 




[DI]+a.O 
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LOOPBOTE: 

ADO 

SI.M 


ADD 

DIiM 


LOOP 

SIBMLOOP 

OUTB: 

POP 

ES 


POP 

BP 


RET 

L 

Sian 

ENDP 


^SUBROUTINE 

DMEKSOURCE-. DESTINATION iN) 

^CONVERT niCROSOFT TO 

INTEL 


ASSUME 

CS:CSEGiES:ESEG 

ONEI 

PROC 

FAR 


PUSH 

BP 


MOV 

BP-iSP 

1 

•nSET UP EXTRA SEGMENT 

TAKING CARE OF RELOCATION 


PUSH 

ES 


CALL 

NEXTS3 

NEXTS3: 

POP 

AX 


SUB 

AX-.(OFFSET NEXTS3)-(0FFSET FIRST-INST) 


MOV 

CL-,M 


SHR 

AX-.CL 


MOV 

BX-.CS 


ADD 

BX-.ESEG 


SUB 

BX-.CSEG 


ADD 

AX-.BX 


MOV 

ES.AX 


^ROUTINE PROPER STARTS HERE 



MOV 

BXi[BP]+t. 

UDDR(N) 


MOV 

CX-,[BX] 

;CX = N 


JCXZ 

AR0UND3 



JMP 

LL3 


AR0UND3: 

JMP 

0UT3 


LL3: 

MOV 

SI-.[BP]+1D 



MOV 

DI-,[BP]+fi 


DMEILOOP: 

MOV 

AX-,[SI] 

;COPY SOURCE INTO 


MOV 

ES:|ilS-.AX 

^lilORK AREA 


MOV 

AX-,[SI]+a 



MOV 

ES:lilS + a-.AX 



MOV 

AX-,[SI]+M 



MOV 

ES:US+MnAX 



MOV 

AX-.[SI]+b 



MOV 

ES:US + b-,AX 



MOV 

DH-,[SIl+b 

nGET SIGN BIT INTO 


AND 

DHifiDH 

iDH 


SUB 

AX-.AX 

;CLEAR AX REGISTER 


MOV 

AL-.[SI]+7 

nGET EXPONENT 


CMP 

AL-.D 

nCHECK FOR ZERO 


JE 

Z3 



ADD 

AX-.(lDa3-ia^) 

^CORRECT BIAS 


CORRECT BIAS 
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SHR 

DH.l 


SHR 

DH.l 


SHR 

DH.l 


SHR 

DH.l 


OR 

AH.DH 


AND 

BYTE PTR ES: 
l[jS + b.7FH 


SHR 

AX.l 


JNC 

L3 


OR 

BYTE PTR ES: 
lilS-Hti.AaH 

L3: 

MOV 

m 

r 

X 

m 

LA3: 

SHR 

AX-,1 


RCR 

BYTE PTR ES: 
(iJS +ti. 1 


RCR 

UORD PTR ES: 
US+M.l 


RCR 

WORD PTR ES: 
US+S.l 


RCR 

WORD PTR 
ES:US.l 


DEC 

BX 


JG 

LA3 


MOV 

BYTE PTR ES: 
liJS + 7.AL 


MOV 

AX.ES:US 


nov 

[DI].AX 


MOV 

AX.ES:US+B 


MOV 

[DIl+E.AX 


MOV 

AX.ES:li)S + M 


MOV 

[DIJ+M.AX 


MOV 

AX.ES:US+b 


MOV 

[DI]+b.AX 


JMP 

L00PB0T3 

Z3: 

MOV 

UORD PTR [DI] 


MOV 

UORD PTR 
[DI]+a.D 


MOV 

UORD PTR 
[DIJ+M.O 


MOV 

UORD PTR 
[DI]+t..a 

L00PB0T3: 

ADD 

SI.a 


ADD 

Di.a 


LOOP 

DMEILABEL 


JMP 

0UT3 

DMSILABEL: 

JMP 

DMSILOOP 

0UT3: 

POP 

ES 


POP 

BP 


RET 

L 

DMSI 

ENDP 



•.SHIFT SIGN BIT 
INTO 

•.RIGHT POSITION 


^SET SIGN BIT 
•.CLEAR OLD SIGN BIT 


•.TURN ON LSE BIT 


•.ALL SET IN WORK 
AREA NObJ 

^iSTICK IN 
DESTINATION 


^STORE AUAY ZERO 
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1 SUBROUTINE Dian(SOURCE-,DESTINATION-.N) 
^CONVERT INTEL TO MICROSOFT 



ASSUME 

CS:CSEG-,ES:ESEG 

oian 

PROC 

FAR 


PUSH 

BP 


MOV 

BPnSP 

n 

iSET UP 

EXTRA SEGMENT 

TAKING CARE OF RELOCATION 


PUSH 

ES 


CALL 

NEXT5M 

NEXTSM: 

POP 

AX 


SUB 

AX-,(OFFSET NEXTSM)-(OFFSET FIRST-INST) 


MOV 

CLnM 


SHR 

AXiCL 


MOV 

BXnCS 


ADD 

BX-.ESEG 


SUB 

BX.CSEG 


ADD 

AXnBX 


MOV 

ES-iAX 

^ROUTINE 

PROPER STARTS 

HERE 


MOV 

BX-,[BP]+b lADDR(N) 


MOV 

CX-,[BX1 nCX=N 


JCXZ 

AROUNDM 


JMP 

LLM 

AROUNDM: 

JMP 

OUT4 

Lm: 

MOV 

SI-.[BP]+10 


MOV 

DI-.[BP]+fi 

OIEMLOOP 

: MOV 

AX-,[SI] nCOPY SOURCE INTO 


MOV 

ES:ldS-.AX ^UORK AREA 


MOV 

AX-.[SI]+a 


MOV 

ES:US + a*.AX 


MOV 

AX-.[SI]+4 


MOV 

ES:lilS + M-.AX 


MOV 

AXi[SI]+L 


MOV 

ES:lijS+t,nAX 


MOV 

DH-.[SI]+7 -nGET SIGN BIT INTO 


AND 

DHnflDH ;DH 


MOV 

AX-,[SI]+L ^GET EXPONENT 


AND 

AXnDlllllllllliaaOQB 


SHR 

AXnl 


SHR 

AX-.1 


SHR 

AX-.1 


SHR 

AX-.1 nNObl EXPO IS IN 

RIGHT SPOT 


CMP 

AX-,(lDa3-12^) iCHECK FOR ZERO 


JBE 

ZM 


SUB 

AXidDSa-ia^) ^CORRECT BIAS 


MOV 

BYTE PTR ES: ^iSTORE AlilAY 

li)S + 7-.AL EXPONENT 


SHR 

DH-.1 ^SHIFT SIGN BIT 

INTO 
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SHR 

DH-.1 


SHR 

DH-.1 


AND 

BYTE PTR ES: 
blS + biDFH 


OR 

BYTE PTR ES: 
lilS+LiDH 


MOV 

BX-,3 

m: 

SHL 

WORD PTR 
ES:li)S-.l 


RCL 

lilORD PTR ES: 
WS+E.l 


RCL 

WORD PTR ES: 
li)S + Mil 


RCL 

BYTE PTR ES: 
US+b-.! 


DEC 

BX 


JG 

m 


MOV 

AX-.ES:US 


MOV 

[DIJiAX 


NOV 

AXnES:li)S+B 


NOV 

[DI]+BnAX 


NOV 

AXnES:li)S + M 


NOV 

[DI1+M-.AX 


NOV 

AXiES:li)S + b 


NOV 

[DI] + t.nAX 


JNP 

LOOPBOTM 

ZM: 

NOV 

lilORD PTR [DI] 


NOV 

lilORD PTR 
[DI]+B-,0 


NOV 

WORD PTR 
[DIJ+M-.0 


NOV 

WORD PTR 
[DI]+b-.D 

LOOPBOTM: 

ADD 

Slnfl 


ADD 

Dlifl 


LOOP 

DIBNLABEL 


JNP 

OUTM 

DIEMLABEL: 

JNP 

DIENLOOP 

OUTM: 

POP 

ES 


POP 

BP 


RET 

b 

DIBn 

ENDP 


iSUBROUTINE 

INITflOfl? 


INITflDfl? 

PROC 

FINIT 

FAR 


RET 


INITflOfl? 

ENDP 


CSEG 

ENDS 

END 



BRIGHT POSITION 
^CLEAR OLD SIGN BIT 
iSET SIGN BIT 


^STICK IN 
DESTINATION 


nSTORE AlilAY ZERO 
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16-bit personal computers, 7 

8086 microprocessor chips, 5 
addressing of words in, 86 
speed of, 22 

8087 floating point library, 18-19 

8087 Numeric Data Processor chips, 7-8 
accuracy of, 2-3,15 

automatic error handling in, 104 
BASIC and, 69-81 
benchmarks for, 21-24 
calculation speed improved by, 13-14 
control options on, 27-29 
as co-processors, 25-26 
data types on, 9-10, 30-32, 35-37 
hardware requirements of, 5 
instruction classes on, 8-9 
instruction sets for, 40-50,179-200 
registers in, 26-27, 39-40 
representation of numbers on, 32-35 
software compatibility for, 11-12,15-16 
speed of, 3-4 

statistical analysis program for, 227-236 
subroutines for solving systems of linear 
equations using, 153-166 
used for commercial data processing, 237 

8088 microprocessor chips, 5, 7-8 
8087 as co-processor with, 25-26 
assembly language programming for, 53- 

68 

benchmarks on, 22 
ESCAPE instructions on, 10 
matrix programs and, 177 
speed of, 3 

unmasked exceptions passed to, 198 

Access to memory for matrix operations, 
115-118 

Accumulator (AX register), 56 

Accuracy, 2-3,15, 28-29, 89-91 
of differentiation, 203, 205 
of double precision numbers, 89 
errors in, 103-104 
of matrix operations, 176 

ADD instruction, 59 

Addition instructions, 45-46 

Addition programs, 50-51, 86-87 
for vectors of strings, 238-241 

Addressing 
by 8088 chip, 8, 25 
of memory, 56-58 
of segments, 65, 74-77 


Affine closure, 28 
Algebra 

Gaussian elimination, 138-139 
matrix, 114,133-134 
matrix manipulation in, 134-137 
non-linear methods for, 201-211 
solving multiple linear systems, 137-138, 
143-178 

see also Equations 
Algorithms, 12 
AND instruction, 60 
APU 111 

Apple computers, 5 
benchmarks on, 23 
Architecture of 8087 chip, 25-37 
Arguments 

double precision, 89-91 
formats for, 43-44 
to transcendental instructions, 183 
Arithmetic instructions, 8,43-47,181-182 
Arrays 

copying oi 96-99 
indexing ot 86-88 
matrices, 114-115 
in memory, 56, 57 
multiple, indexing of, 91-93 
see also Matrix operations 
ASCII code, 16, 32 
Assembler directives, 58, 67-68 
Assemblers 

object modules created by, 77 
WAIT instructions inserted by, 26 
Assembly language 
8087 accuracy available to, 15 
for BASIC, 19-20 
loading of programs in, 77 
matrix-handling programs in, 177 
in packaged programs, 16 
program for addition of vectors of strings 
in, 238-241 

programming in, 14, 53-68 
ASSUME directive, 67 
AX (accumulator) register, 56 

Band matrices, 176 
Base register, 56 
BASIC 

8087 chip and, 69-81 
assembly language modules for, 19-20 
benchmark programs in, 22 
compiled, 18-19 
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compilers and interpreters for, 13 
data types available in, 9 
error handling in, 104 
error messages in, 100 
indexing in, 57 
interpreted, 17-18 
machine language and, 43 
matrix operations in. 111, 114-116,177 
non-linear methods in, 201 
packaged programs in, 16 
for use with 8087, 2 
BASIC compilers 

with 8087 floating point library, 18-19 
benchmarks on, 22 
interactive use of, 80-81 
loading subroutines into, 78-79 
program initialization in, 224-225 
BASIC interpreters, 17-18 
benchmarks on, 22 
high overhead on, 4 
installation of 8087 chips and, 5 
interactive use of, 79-80 
loading subroutines into, 77-78 
program initialization in, 224-225 
Benchmarks, 21-24 
Binary files, 16,18 
Binary operations, 95 
Binary point, 32 
BLOAD. command, 74, 78 
BP (stack pointer) register, 56, 57, 72 
Branching instructions, 64-65 
in assembly language, 62-64 
from subroutines, 66-67 
BSAVE command, 78 
BX (index) register, 56, 57 
Bytes, addressing of, 56 

Calculation time, 13,17 
using compiled BASIC, 19 
Calculus, 207 
CALL instruction 
in assembly language, 66 
in BASIC, 69 

Canned (packaged) programs, 15-16, 218- 
227 

Central Processing Units (CPUs), 7 
8087 chip as co-processor in, 25 
Circuit boards, 5 
Clock speeds, 22 
CMP instruction, 61 
Code segment (CS) register, 65, 72 
Column vectors, 116 
Commands 

in assembly language, 58 
in canned statistical analysis program, 219- 
220 

Commercial data processing, 237-242 


Comparison instructions, 9, 48-50 
in assembly language, 61-62 
Compatibility 

of 8087 and non-8087 versions of BASIC, 
18 

of software, 11-12,15-16 
Compiled BASIC, see BASIC compilers 
Compilers, 13 

8087 accuracy available to, 15 
with 8087 floating point library, 18-19 
for 8087 "native code," 19 
benchmarks using, 22 
see also BASIC compilers 
Condition code bits, 26, 64 
examining, 48, 50 
Constants, 9 
instructions for, 182 
Control work (register), 27-28 
processor control instructions for, 196 
Correlation, 215-216 
in canned statistical analysis program, 222- 
223 

Cosines, 190 
Counter testing, 177-178 
Coimt (CX) register, 56 
Conversions 

between ASCII and binary representa¬ 
tions, 32 

of pre-8087 software, 17 
Co-processors, 7-8 
organization of, 25-26 
Copying of arrays, 96-99 
CPUs (central processing units), 7 
8087 chip as co-processor in, 25 
Crout decomposition, 150-153,176,177 
back substitution after, 166-172 
subroutines using, 153-166 
CS (code segment) register, 65, 72 
CX (count) register, 56 

Data 

in canned statistical analysis program, 220- 
222 

invalid, errors from, 109 
representation of, 11-12,16, 32-35 
rules for handling ot 40-41 
special types of, 35-37 
storage of, in canned programs, 218-219 
types of, 9-10, 30-32 
Data definition, 58-59 
Data processing, commercial, 237-242 
Data registers, 8, 26 
Data segment (DS) register, 65,129 
Data transfer instructions, 9, 40-43 
Debugging, 100 
DEBUG utility, 77 
DEC-2060, 23 
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Decimal data, 10 
see also Packed decimal data 
DEC instruction, 60 
DEF FN statement 201 
Denormal data, 36 
Dependent variables, 216 
Derivatives, 202-205, 207, 210, 211 
Descriptive statistics, 214-215 
in canned statistical analysis program, 222 
Diagonal matrices, 176 
Differentiation, numerical, 202-205, 207 
Digital Equipment Corporation, 23 
Directives, 58, 67-68 
DI (index) register, 56, 57 
Disk storage, 221, 222 
Displacement (memory location), 56, 57 
Division instructions, 43, 47 
Double precision (long real) data, 9,15, 31, 
33 

in 8087 and non-8087 versions of BASIC, 
18 

arguments in, 89-91 
DS (data segment) register, 65,129 
DX register, 56 

EDP (Electronic Data Processing), 237-238 
Effective addresses, 65 
Element-by-element matrix operations, 119- 
120 

END directive, 68 
ENDP directive, 67-68 
ENDS directive, 67 
Equations 

Crout decomposition of, 149-153 
Gaussian elimination for, 138-139 
linear, 114-115,133,143,175-176 
linear, 8087 subroutines for solving, 153- 
166 

linear, back substitution after Crout re¬ 
duction for, 166-172 
linear, zero pivots in, 147-149 
LU decomposition of, 149-150 
manipulation of, 133-134 
matrix inversions for, 173-175 
matrix manipulation for, 134-137 
multiple linear, solving, 137-138 
multiple regression, 216, 217 
non-linear, solving, 207-209 
Error conditions, 3 
duplicate file names, 221 
exceptions, 26, 29 
overflows, 18 

from programming, 99-100 
from subroutines, 99 
from use of subroutines, 100 
Error handling 
in BASIC, 104 


in canned statistical analysis program, 
226 

Errors 

correction of, in canned statistical analysis 
program, 221, 226 

in precision, 103-109 
Error terms, 216 
ESCAPE instructions, 10,11, 25 
ES (extra segment) register, 65 
Exception-handling instructions, 198-199 
Exception-handling software, 104 
Exceptions, 26, 29 
Execution pointers (registers), 27 
Explicit operands, 45 
Exponentiation, 186-188 
Exponents, 33-34 
Extra segment (ES) register, 65 
EXTRN directive, 68 

F2XM1 instruction, 183,186 
FABS instruction, 47 
FADD instruction, 45, 46 
FADDP instruction, 46 
FAR procedures, 127,158 
FBLD instruction, 43 
FBSTP instruction, 43 
FCHS instruction, 47 
FCLEX instruction, 198-199 
FCOM instruction, 48 
FCOMP instruction, 48 
FCOMPP instruction, 49 
FDECSTP instruction, 199 
FDISI instruction, 198 
FDIV instruction, 47 
FDIVP instruction, 47 
FDIVR instruction, 47 
FDIVRP instruction, 47 
FENI instruction, 198 
FFREE instruction, 199 
FIADD instruction, 46 
FICOM instruction, 49 
FICOMP instruction, 49 
FIDIV instruction, 47 
FIDIVR instruction, 47 
FILD instruction, 42 
Files, binary, 16,18 
FIMUL instruction, 46 
FINCSTP instruction, 199 
FINIT instruction, 197 
FIST instruction, 42 
FISTP instructioa 42 
FISUB instruction, 46 
FISUBR instruction, 46 
Flag register, 61 
Flags, 61, 64 

in canned statistical analysis program, 223 
FLDl instruction, 183 
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FLDCW instruction, 196 
FLDENV instruction, 199 
FLD instruction, 41 
FLDL2E instruction, 183 
FLDL2T instruction, 183 
FLDLG2 instruction, 183 
FLDLN2 instruction, 183 
FLDPI instruction, 183 
FLDZ instruction, 47,182 
Floating point libraries, 18-19 
Floating point numbers, 29-30 
in 8087 and non-8087 versions of BASIC, 
18 

errors avoided by using, 109 
representation of, 32-34 
FMUL instruction, 46 
FMULP instructioa 46 
FNOP instruction, 199 
FPATAN instruction, 184-185,194 
FPREM instruction, 181-182 
FPTAN instruction, 184,188 
FRDINT instruction, 181 
FRSTOR instruction, 197 
FSAVE instruction, 196 
FSCALE instruction, 181,186 
FSQRT instruction, 47 
FSTCW instruction, 196 
FSTENV instruction, 199 
FST instruction, 41-42 
FSTP instruction, 42 
FSTSW instruction, 50,64 
FSUB instruction, 46 
FSUBP instruction, 46 
FSUBR instruction, 46 
FSUBRP instruction, 46 
FTST instruction, 49 
Functions 

differentiation of, 202-205 
for floating point conversions, 18 
integration of, 205-207 
inverse trigonometric, 194-195 
non-linear, 201-202 
non-linear, optimizing, 209-211 
non-linear, solving, 207-209 
trigonometric, 188-194 
FWAIT instruction, 26, 64, 196 
FXAM instruction, 49 
FXCH instruction, 42 
FXTRACT instruction, 182 
FYL2X instruction, 184 
FYL2XP1 instruction, 184 

Gaussian elimination, 135-139,177 
zero pivot problem in, 147-149 
General registers, 55-57 

Hardware, 5 


benchmarks for, 21-24 
data type representations on, 32 
for matrix operations. 111 
with pre-8087 software, 17 
speed of, 12 

IBM 3081, 23 

IBM Personal Computers (PCs) 
benchmarks on, 22 
compiled BASIC on, 18,19 
CPU on, 7 
socket for 8087 in, 5 
Identity matrices, 140 
Immediate operands, 57 
Implicit operands, 45 
IMSL Ubrary, 176 
INC instruction, 60 
Independent variables, 216 
Indexes in matrices, 114 
partial pivoting with, 153,156 
Indexing 
of arrays, 86-88 
of memory, 56-57 
of multiple arrays, 91-93 
Index registers, 56, 57 
Infinity, 28, 36 
Initialization, 223-225 
Inner products 

in correlation coefficients, 215 
in matrices, 123-133 
Installation of 8087 chips, 5 
Instructions and instruction sets, 7,10, 39 
advanced, 179-200 
arithmetic, 43-47 
in assembly language, 58-68 
classes of, 8-9 
comparison, 48-50 
in co-processor environments, 25-26 
data transfer, 40-43 
Integer (word integer) data, 9, 31 
indefinite, 37 
overflows of, 109 
representation of, 35 
transfer instructions for, 42-43 
Integer format arguments, 44 
Integration, numerical, 205-207 
Intel microprocessor chips, 5 
see also 8086 microprocessor chips; 8087 
Numeric Data Processor chips; 8088 
microprocessor chips 
Interactive programming 
in compiled BASIC, 80-81 
in interpreted BASIC, 79-80 
Interpreted BASIC, see BASIC interpreters 
Interpreters, 13 
BASIC, 17-18 
benchmarks using, 22 
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see also BASIC interpreters 
Interrupts 

instructions for, 198-199 
by unmasked exceptions, 29 
Inverse trigonometric functions, 194-195 
Invocation time, 13 

JCXZ instruction, 63 
JMP instruction, 62-63 

Ubels, 58-59 

Library collections (of programs), 176 
Linear equations and systems, 114-115,133- 
134,143,175-178 

8087 subroutines for solving, 153-166 
back substitution after Crout reduction 
for, 166-172 

Crout decomposition of, 150-153 
Gaussian elimination for, 138-139 
LU decomposition of, 149-150 
matrix inversions for, 173-175 
matrix manipulation for, 134-137 
solving, 137-138 
zero pivots in, 147-149 
Linear operations, 14,17,19, 20 
UNK utility, 77 
Loading 

of assembly language programs, 77 
of subroutines into compiled BASIC, 78- 
79 

of subroutines into interpreted BASIC, 
77-78 ^ 

Logarithms, 185 

Long integer data, 9, 31 

Long real data, see Double precision data 

LOOP instruction, 63-64 

Loops 

counter testing in, 177-178 
endless, 205 
for optimization, 177 
LU decomposition, 149-150 

Machine language 

errors in programs in, 100 
subroutines in, 72-74 
MACRO assembler, 178 
Mainframe computers, 3,19, 21 
benchmarks on, 23 
MAT functions. 111 

Mathematical instructions, 7, 43-47, 181- 
195 

Matrices, 14 

in canned statistical analysis program, 226 
special types of, 176 
Matrix multiplication, 115,123-133 
Matrix multiplication program 
8087 speed ot 4 
as benchmark, 21 


execution time for, 14,17 
speed of, in assembly language, 20 
Matrix operations. 111, 114-115, 143, 175- 
178 

8087 subroutines for solving systems of 
linear equations, 153-166 

back substitution after Crout reduction, 
166-172 

Crout decomposition, 150-153 
Gaussian eliminations, 138-139 
inversion, 139-141,173-175 
LU decomposition, 149-150 
manipulation, 134-137 
memory access for, 115-118 
multiplication and inner products, 123- 
133 

programs for,. 111-113 
scalar and element-by-element 119-120 
solving multiple linear systems, 137-138 
transpositions, 121-123 
zero pivot problem in, 147-149 
Means (averages), 214 
Memory 

addressed by 8088 chip, 8 
addressing of, 56-58 
data transfer instructions for, 40-43 
for matrix operations, 115-118 
segments in, 55, 65-66 
Menus, 219-220 
Microprocessors 

8088 chip, 7 
co-processors, 25-26 
speed of, 22 

Microsoft 12 

conversions of software in format of, 17 
Minicomputers, 3,19, 21 
benchmarks on, 23 
Modular programming, 218 
Modules, 218-227 
MOV instruction, 60 
MUL instruction, 60 
Multiple regression, 216-217 
in canned statistical analysis program, 
223 

Multiplication 
instructions for, 46 
matrix, 115,123-133 
shifting versus, 157-158,177 
see also Matrix multiplication program 
Multiple arrays, indexing ot 91-93 

NAN (Not-A-Number) data, 37 
"Native code" compilers, 19 
NEAR procedures, 127,158 
Negative numbers, 35 
Non-linear operations, 14,18-20, 200, 201- 
211 
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Normal distribution, 215 
Normalized floating point format, 33 
Not-A-Number (NAN) data, 37 
Numbers 

floating point 29-30 
representation of, 11-12,16, 32-35 
rounding of, 27-28 
special types of, 35-37 
t^es of, 9-10, 30-32 
Number systems, 29 
Numerical differentiation, 202-205, 207 
Numerical integration, 205-207 

Object modules, 77 
Operands 
immediate, 57 
stack, 45 
Optimization 
of loops, 177 
non-linear, 209-211 
Ordinary least squares, 217 
OR instruction, 60 
Overflow errors, 18,109 
Overhead, time spent on, 4 

Packaged (canned) programs, 15-16, 217- 
227 

Packed decimal data, 10, 31, 32 
in commercial data processing, 238 
indefinite, 37 

in program to add vectors of strings, 241 
representation oi 35 
transfer instructions for, 42-43 
Parameters, skip, 126 

Partial pivoting, 148-149,153-156,161,165, 
176 

PCs, see IBM Personal Computers 
Personal computers 
CPUs on, 7 

installation of 8087 chips in, 5 
see also IBM Personal Computers 
POP instruction, 66 
Positive numbers, 35 
Pre-8087 software, 2 
compared with 8087 subroutines, 4 
hardware with, 17 
noncompatibility ofc 12,16 
Precision, 3, 28-29, 89-91 
errors in, 103-109 
see also Accuracy 
Procedures, 55 

NEAR and FAR, 127,158 
Processor control instructions, 9, 50, 196- 
197 

Processors, 7-8 
co-processors, 25-26 
PROC FAR directive, 67-68 


Programming 

in assembly language, 53-68 
in BASIC, 69-81 
errors from, 99-100 
matri^g advanced, 175-178 
modular, 218 
Programs 
addition, 50-51 

for addition of vectors of strings, 238-241 
in assembly language, loading of, 77 
bugs in, 100 
compatibility of, 11-12 
for linear systems and matrix inversion, 
advanced, 143-146,153-166,177-178 
matrix, 111-113 

packaged (canned), 15-16, 218-219 
simple subroutines, 83-86 
speed of, 12-13 
for statistical analysis, 219-236 
translation of, 13-14 

using advanced instruction set, 179-180 
see also Software, Subroutines 
Projective closure, 28 
Pseudo-zero, 36 
PTR directive, 58 
PUBLIC directive, 68 
Pushdown stacks, 8, 26 
PUSH instruction, 66 

Read Only Memory (ROM) chips, 5,17 
Real-and-pop format arguments, 43-44 
Real format arguments, 43 
Read indefinite data, 37 
Real transfer instructions, 41-42 
Registers, 8, 26-27, 39-40 
flag, 61 

general, 55-57 
segment, 65 

Register stack, 26, 39-40 
comparison instructions for, 48 
Regression, 216-217 

in canned statistical analysis program, 223 
Relocation of subroutines, 74-77 
Representation of numbers, 11-12,16, 32 
floating point, 32-34 
integer, 35 
packed decimal, 35 
Residuals (in regression), 217 
RET instruction, 66-67 
Returns from subroutines, 66-67 
Reversed division instructions, 43 
Reversed subtraction instructions, 43 
ROM (Read Only Memory) chips, 5,17 
Rounding, 27-28 
Routines, see Subroutines 
Row vectors, 116 
R-squared statistic, 217 
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SAHF instruction, 62 
Scalar matrix operations, 119-120 
Scalar subroutines, 93-95 
Scientific notation, 30, 33 
Screen handling, 227 
SEGMENT directive, 67 
Segment registers, 65 
Segments, 55, 65-66 
addressing ot 74t-77 
Shifting, 157-158,177 
SHL instruction, 60 
Short integer data, 9, 31 
Short real (single precision) data, 9, 31, 33, 
89 

SHR instruction, 60-61 
Simultaneous linear equations, see Linear 
equations and systems 
Sines, 190 

Single precision (short real) data, 9, 31, 33, 
89 

SI (index) register, 56, 57 
Skip parameters, 126 
Software 
for 8087,1 
8087-compatible, 15 

assembly language modules for BASIC, 
19-20 

BASIC, 69-81 
benchmarks for, 21 
compatibility of, 11-12 
compilers, 18-19 
ESCAPE instructions in, 10 
exception-handling, 104 
interpreted BASIC, 17-18 
packaged programs, 15-16 
pre-8087, 2,17 
speed of, 12-13 

upgrading of, during installation of 8087 
chips, 5 

see also Programs; Subroutines 
Source programs 
compiling of, 18 
translation of, 13-14 
Sparse matrices, 176 
Special data types, 35-37 
Speed 

of 8087-equipped PCs, 3-4 
of assembly language, 19, 20 
benchmarks of, 21-24 
of matrix multiplication subroutines, 130- 
132 

of pre-8087 software, 2 
of programs, 12-13 
SP (stack pointer) register, 56, 65, 72 
Square root program 
8087 speed of, 4 
as benchmark, 21 


execution time for, 14,17 
SS (stack segment) register, 65,129 
Stack operands, 45 

Stack pointers (BP and SP registers), 56,65, 
72 

Stacks, 8, 26, 39-40 
arithmetic instructions on, 43-45 
comparison instructions for, 48 
used in matrix multiplication, 128,129 
Stack segment (SS) register, 65,129 
Standard deviations, 214-215 
Standard errors, 216, 217 
Statistical analysis, 213-214 
correlation in, 215-216 
descriptive statistics in, 214-215 
multiple regression in, 216-217 
program for, 219-236 
ST (stack) registers, 26, 39-40 
arithmetic instructions on, 43-45 
Status word (register), 26-27 
processor control instructions for, 196 
Strings, program for addition of vectors of, 
238-241 

Subroutines, 83-86 
branching and returns from, 66-67 
calling, in BASIC, 69-71 
double precision arguments in, 89-91 
errors resulting from, 99 
errors in use of, 100-103 
for linear systems and matrix inversion, 
advanced, 143-146,153-166 
loading of, into compiled BASIC, 78-79 
loading of, into interpreted BASIC, 77-78 
in machine language, 72-74 
matrix, 111-113 

relocation and segment addressing of, 74- 
77 

scalar, 93-95 

using advanced instruction set, 179-180 
utility, 96-99 

Subtraction instructions, 43, 46 
Symmetric matrices, 176 
Synchronization of co-processors, 26 

Tag word (register), 27, 36 
Tangents, 184, 188, 200 
in differentiation, 202, 208 
Temporary real data, 9, 31, 89 
TEST Une, 26 

Transcendental instructions, 9,183-185 
invalid data fed into, 109 
Translation of source programs, 13-14 
Translation time, 13 

eliminated by compilers, 18 
Translators 

8087 accuracy available to, 15 
8087-compatible, 17 
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compilers, 18-19 

number representations in, 12 

speed of, 12-13 

see also Compilers; Interpreters 
Transpositions of matrices, 121-123 
Trigonometric functions, 188-194 
inverse, 194-195 
t-statistic, 216 

"Two's complement" format, 35 

Unary operations, 95-96 
Underflows, 36 
Unnormal data, 36 
Utility subroutines, 96-99 


Variables 

correlation coefficients for, 215-216 
in multiple regression, 216-217 
Variance, 214 
VAX-780, 23 

WAIT instruction, 26 
Word integer data, see Integer data 
Word processing, 4 
Words, addressing ot 56 

Zero, 34, 36 

in matrix manipulations, 135, 138, 140, 
147-149,152,176 



Diskette Files to Accompany 8087 
Applications and Programming for the IBM PC 
and Other PCs 

The diskette files accompanying 8087 Applications and Programming for the 
IBM PC and Other PCs are described in this note. Complete descriptions 
of the programs and their operation appear in the text. This note is limited 
to a technical description of the diskette files. 

If you have not already done so, please read the copyright notice, 
liability disclaimer, and the section on the inherent dangers in using 
machine language programs. 

The programs require one single-sided disk drive, 64K of memory, a 
copy of the operating system version 1.1 or 2.0 and, for the most part, 
an 8087. The programs are distributed on a "flippy diskette." (Each side 
of the diskette is equivalent to one regular single-sided diskette.) The 
diskette is not copy protected. 

The assembly language programs in the text appear in the following 
files: 

VECTOR.ASM Chapter 9 programs—basic vector routines 

MATRIX.ASM Chapter 10 programs—basic matrix routines 

M AT AD V. ASM Chapter 11 programs—advanced matrix routines 

TRANS.ASM Chapter 12 programs—transcendental routines 

BCD.ASM Chapter 15 program—compiler version 

BCDI.ASM Chapter 15 program—interpreter version 

CONVERT.ASM Appendix programs—Intel/Microsoft conversion 

routines 

These files are almost, but not exactly, identical to the programs ap¬ 
pearing in the book. The differences are: 

1. All 8087 mnemonics have been replaced with the equivalent 8088 
mnemonics so that the programs can be assembled by assemblers 
which do not recognize the 8087 names. The 8087 mnemonics have 
a semicolon placed in front of them to turn them into comments. 
(For the information of IBM PC users, all these files can be assembled 
using version 1.0 of the IBM Macro Assembler.) 

2. The programs from each chapter have been grouped together in 
one ffle. Slight rearrangements of CSEG/ENDS statements have been 
made. Some statement labels have been modified to eliminate du¬ 
plicate definitions. For example, you will see labels "NEXTOl", 
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"NEXT02'', and so forth, instead of "NEXT", and "NEXT", and so 
forth. 


Not everyone has an assembler program. As a convenience, each file 
above has been assembled into a program with the extension ".OBJ" 
replacing the extension ".ASM". 

Since linking a machine language program for use with interpreted 
BASIC is time consuming, we have translated each of the files into a file 
with the extension ".SAV". These files can be loaded directly into inter¬ 
preted BASIC using the BLOAD command. (Since program BCD can be 
used only with the interpreter, there is no ".SAV" version. Use "BCDI.SAV" 
instead.) 


The memory map produced by the LINK program appears in files with 
extension ".MAP". Use the information in these files to find the offset 
of a particular routine. If you are going to load more than one file into 
BASIC, remember that the relocation scheme explained in the book re¬ 
quires the routines to be loaded at an address ending in hexadecimal 
zero. That is, you can say BLOAD "CONVERT.SAV",&H10, but you 
should not try BLOAD "CONVERT.SAV",&Hll. 

For an example of loading multiple assembly language programs into 
an interpreted BASIC program, see the program "STAT87.BAS". 

Remember that the assembly language routines expect all data to be 
in INTEL format. If you are mixing these routines with pre-8087 pro¬ 
grams, you must convert data. For an example of using conversion rou¬ 
tines, see the program "STATPRE.BAS". 

The "8087 Statistical Analysis Program" appears in two versions. 
STAT87.BAS is the program as it appears in the text. STATPRE.BAS 
includes calls to the conversion routines, so you can use the program 
immediately with pre-8087 versions of BASIC. Module 12 of these pro¬ 
grams include "CLEAR" and "DEF SEG" statements that allow these 
programs to run in systems with 64K of memory. If you have more 
memory, you may want to change these statements to increase the space 
available for data storage. The programs are standard text files. If you 
load the program into BASIC and then SAVE it, the SAVEd version will 
LOAD much faster than the original. If you eliminate the REMark state¬ 
ments from the program, the space for data storage will increase. 

The following BASIC programs also appear on the diskette. Remember 
to modify these programs to reflect your own data and functions. 


CPP.BAS 

GPP.BAS 

SOLP.BAS 

DIFFER.BAS 

INTEGRAT.BAS 

ZERO.BAS 

MAX.BAS 


Crout decomposition 

Gauss decomposition 

Solution following Crout decomposition 

numerical differentiation 

numerical integration 

solve non-linear equation 

maximize non-linear function 
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“A pleasure tc reaii ... the author’s style is excellent and 
the <ixplanaticn of what the 8087 can do is terrific .. 

— Peter Norton, President of NORTON UTILITIES, author of Inside the IBM 


Now) Large See le Numerical Computing 

Is Made Faster and Easier Than Ever Before With ... 

8087 Applications and Programming 
For The IBM PC And Other PCs 

Rrchard Startz 


Finally—a book th.it gives you a clear, complete explanation of 
the number crunohing 8087 microprocessor ^for the IBM PC 
and other compalible machines! Whether you’re a “program 
writer” or “program user,” this unique guide hefps you under¬ 
stand how the 8087 chip works, what it does, and how fast 
it processes! 

• For Novice and i/otential Chip Users, the text includes a non¬ 
technical overview of the capabilities of the 8087, featuring 
speed benchmarking and guidelines for buying compatible 
8087 software! 

• For Program Writers Who Want to Know Intimate Details 
About The Chip, the text includes a complete section of the 
8087’s instructfons—with special attention to linking as¬ 
sembly language and BASIC programs! 

• For Program Usnrs, the text includes a wide variety of ready- 
to-use “cookbook” applications designed to give you an¬ 
swers fast anc easy, including the “8087’s Statistical 
Analysis Program”! 

CONTENTS 

Turning Minutes Into Seconds/The Intel 8087 Chip/Buying 
and Building 80€P7-Compatible Software/Benchmarks/Intro¬ 
duction to 8087 Architecture/Simple Instruction Set/Intro¬ 
duction to 8088 Assembly Language Programming/BASIC 
and the 8087/Simple 8087 Routines/Basic Matrix Operations/ 
Linear Systems and Matrix Inversion: More Advanced Compu¬ 
tational Techniques/Advanced Instruction Set/Non-Linear 
Methods/Statistic^al Analysis and Program Canning/Commer* 
cial Data Proce >sing/Appendices/lndex/Diskette Files to 
Accompany 8087 Applications and Programming 

, ALSO AVAILABLE ... OPTIONAL DISKETTE 

This accompanying\diskette includes all programs from the text See 
insert inside this be ok for ordering information. 


ISBN □-fl'^303-MS0-7l 



